Part Number Hot Search : 
M1104 AC10EGML AD804206 M1104 E003586 MP351 MRA34P ST70136B
Product Description
Full Text Search
 

To Download EC603E Datasheet File

  If you can't view the Datasheet, Please click here to try to view without PDF Reader .  
 
 


  Datasheet File OCR Text:
  overview programming model instruction and data cache operation exceptions memory management instruction timing signal descriptions system interface operation power management powerpc instruction set listings instructions not implemented powerpc 603 processor system design and programming considerations glossary index 2 3 4 5 6 7 8 9 a b c 1 glo ind
overview programming model instruction and data cache operation exceptions memory management instruction timing signal descriptions system interface operation power management powerpc instruction set listings instructions not implemented powerpc 603 processor system design and programming considerations glossary index 2 3 4 5 6 7 8 9 a b c 1 glo ind
mpc603eum/ad 11/97 rev. 1 mpc603e & EC603E risc microprocessors user's manual with supplement for powerpc 603 microprocessor
. this document contains information on a new product under development. motorola reserves the right to change or discontinue thi s product without notice. information in this document is provided solely to enable system and software implementers to use powerpc microprocessors. ther e are no express or implied copyright licenses granted hereunder to design or fabricate powerpc integrated circuits or integrated circuits based on the information in this document. motorola reserves the right to make changes without further notice to any products herein. motorola makes no warranty, represen tation or guarantee regarding the suitability of its products for any particular purpose, nor does motorola assume any liability arising out of the application or use of any product or circuit, and speci?ally disclaims any and all liability, including without limitation consequential or incidental damages. ?ypical parameters can and do vary in different applications. all operating parameters, including ?ypicals must be validated for each customer application by customers technical experts. motorola does not convey any license under its patent rights nor the rights of others. motorola products are not desig ned, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sust ain life, or for any other application in which the failure of the motorola product could create a situation where personal injury or death may occur. sho uld buyer purchase or use motorola products for any such unintended or unauthorized application, buyer shall indemnify and hold motorola and its of?ers, employees, subsidiaries, af?iates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that moto rola was negligent regarding the design or manufacture of the part. motorola and are registered trademarks and EC603E is a trademark of motorola, inc. motorola, inc. is an equal opportunity/af?m ative action employer. the powerpc name, the powerpc logotype, and powerpc 603 are trademarks of international business machines corporation used by m otorola under license from international business machines corporation. ?motorola inc. 1997. all rights reserved. portions hereof ?international business machines corp. 1991?997. all rights reserved.
motorola contents iii contents paragraph number title page number about this book audience ............................................................................................................ xxix organization....................................................................................................... xxix suggested reading...............................................................................................xxx conventions ..................................................................................................... xxxiii acronyms and abbreviations .......................................................................... xxxiv terminology conventions .............................................................................. xxxvii chapter 1 overview 1.1 overview.............................................................................................................. 1-1 1.1.1 features............................................................................................................ 1-2 1.1.2 system design and programming considerations........................................... 1-7 1.1.2.1 hardware features ....................................................................................... 1-7 1.1.2.1.1 replacement of xats signal by cse1 signal ....................................... 1-7 1.1.2.1.2 addition of half-clock bus multipliers.................................................. 1-7 1.1.2.2 software features ........................................................................................ 1-8 1.1.2.2.1 16-kbyte instruction and data caches .................................................... 1-8 1.1.2.2.2 clock configuration available in hid1 register ................................... 1-8 1.1.2.2.3 performance enhancements..................................................................... 1-8 1.1.3 instruction unit ................................................................................................ 1-9 1.1.3.1 instruction queue and dispatch unit .......................................................... 1-9 1.1.3.2 branch processing unit (bpu) .................................................................... 1-9 1.1.4 independent execution units......................................................................... 1-10 1.1.4.1 integer unit (iu) ........................................................................................ 1-10 1.1.4.2 floating-point unit (fpu) ......................................................................... 1-10 1.1.4.3 load/store unit (lsu) .............................................................................. 1-11 1.1.4.4 system register unit (sru)...................................................................... 1-11 1.1.4.5 completion unit ........................................................................................ 1-11 1.1.5 memory subsystem support.......................................................................... 1-12 1.1.5.1 memory management units (mmus)....................................................... 1-12 1.1.5.2 cache units................................................................................................ 1-13 1.1.6 processor bus interface ................................................................................. 1-14
iv mpc603e & EC603E risc microprocessors user's manual motorola contents paragraph number title page number 1.1.7 system support functions..............................................................................1-14 1.1.7.1 power management ....................................................................................1-15 1.1.7.2 time base/decrementer .............................................................................1-15 1.1.7.3 ieee 1149.1 (jtag)/cop test interface ..................................................1-16 1.1.7.4 clock multiplier .........................................................................................1-16 1.2 powerpc architecture implementation..............................................................1-16 1.3 implementation-specific information ................................................................1-16 1.3.1 programming model.......................................................................................1-17 1.3.1.1 processor version register (pvr) .............................................................1-18 1.3.1.2 hardware implementation register 0 (hid0)............................................1-18 1.3.1.3 run_n counter register (run_n) .............................................................1-19 1.3.1.4 general-purpose registers (gprs) ............................................................1-19 1.3.1.5 floating-point registers (fprs).................................................................1-19 1.3.1.6 condition register (cr).............................................................................1-19 1.3.1.7 floating-point status and control register (fpscr) ................................1-19 1.3.1.8 machine state register (msr)...................................................................1-19 1.3.1.9 segment registers (srs) ............................................................................1-19 1.3.1.10 special-purpose registers (sprs)..............................................................1-20 1.3.1.10.1 user-level sprs ....................................................................................1-20 1.3.1.10.2 supervisor-level sprs ..........................................................................1-20 1.3.2 instruction set and addressing modes...........................................................1-23 1.3.2.1 powerpc instruction set and addressing modes.......................................1-23 1.3.2.1.1 powerpc instruction set ........................................................................1-23 1.3.2.1.2 calculating effective addresses ............................................................1-24 1.3.2.2 implementation-specific instruction set....................................................1-25 1.3.3 cache implementation....................................................................................1-25 1.3.3.1 powerpc cache characteristics .................................................................1-25 1.3.3.2 implementation-specific cache implementation .......................................1-26 1.3.4 exception model ............................................................................................1-27 1.3.4.1 powerpc exception model ........................................................................1-27 1.3.4.2 implementation-specific exception model................................................1-29 1.3.5 memory management ....................................................................................1-32 1.3.5.1 powerpc memory management ................................................................1-32 1.3.5.2 implementation-specific memory management........................................1-32 1.3.6 instruction timing ..........................................................................................1-33 1.3.7 system interface .............................................................................................1-35 1.3.7.1 memory accesses.......................................................................................1-36 1.3.7.2 signals ........................................................................................................1-36 1.3.7.3 signal configuration ..................................................................................1-38
motorola contents v contents paragraph number title page number chapter 2 programming model 2.1 register set ..........................................................................................................2-1 2.1.1 powerpc register set ......................................................................................2-1 2.1.2 implementation-specific registers ..................................................................2-7 2.1.2.1 hardware implementation registers (hid0 and hid1) ..............................2-7 2.1.2.2 data and instruction tlb miss address registers (dmiss and imiss) ................................................................................2-9 2.1.2.3 data and instruction tlb compare registers (dcmp and icmp) ..................................................................................2-9 2.1.2.4 primary and secondary hash address registers (hash1 and hash2) ...........................................................................2-10 2.1.2.5 required physical address register (rpa)...............................................2-11 2.1.2.6 instruction address breakpoint register (iabr) ......................................2-11 2.1.2.7 run_n counter register (run_n).............................................................2-12 2.2 operand conventions.........................................................................................2-12 2.2.1 floating-point execution models?isa .....................................................2-12 2.2.2 data organization in memory and data transfers ........................................2-13 2.2.3 alignment and misaligned accesses .............................................................2-13 2.2.4 floating-point operand..................................................................................2-14 2.2.5 effect of operand placement on performance...............................................2-14 2.3 instruction set summary....................................................................................2-15 2.3.1 classes of instructions....................................................................................2-16 2.3.1.1 definition of boundedly undefined ..........................................................2-16 2.3.1.2 defined instruction class...........................................................................2-16 2.3.1.3 illegal instruction class .............................................................................2-17 2.3.1.4 reserved instruction class.........................................................................2-18 2.3.2 addressing modes..........................................................................................2-18 2.3.2.1 memory addressing...................................................................................2-18 2.3.2.2 memory operands......................................................................................2-18 2.3.2.3 effective address calculation ...................................................................2-19 2.3.2.4 synchronization .........................................................................................2-19 2.3.2.4.1 context synchronization........................................................................2-20 2.3.2.4.2 execution synchronization ....................................................................2-20 2.3.2.4.3 instruction-related exceptions ..............................................................2-20 2.3.3 instruction set overview................................................................................2-21 2.3.4 powerpc uisa instructions ..........................................................................2-21 2.3.4.1 integer instructions ....................................................................................2-21 2.3.4.1.1 integer arithmetic instructions ..............................................................2-22 2.3.4.1.2 integer compare instructions.................................................................2-22 2.3.4.1.3 integer logical instructions ...................................................................2-23 2.3.4.1.4 integer rotate and shift instructions .....................................................2-24
vi mpc603e & EC603E risc microprocessors user's manual motorola contents paragraph number title page number 2.3.4.2 floating-point instructions .........................................................................2-25 2.3.4.2.1 floating-point arithmetic instructions...................................................2-26 2.3.4.2.2 floating-point multiply-add instructions..............................................2-26 2.3.4.2.3 floating-point rounding and conversion instructions ..........................2-27 2.3.4.2.4 floating-point compare instructions .....................................................2-27 2.3.4.2.5 floating-point status and control register instructions........................2-27 2.3.4.2.6 floating-point move instructions...........................................................2-28 2.3.4.3 load and store instructions........................................................................2-28 2.3.4.3.1 self-modifying code..............................................................................2-29 2.3.4.3.2 integer load and store address generation ..........................................2-29 2.3.4.3.3 register indirect integer load instructions............................................2-29 2.3.4.3.4 integer store instructions .......................................................................2-30 2.3.4.3.5 integer load and store with byte-reverse instructions ........................2-31 2.3.4.3.6 integer load and store multiple instructions.........................................2-32 2.3.4.3.7 integer load and store string instructions.............................................2-33 2.3.4.3.8 floating-point load and store address generation...............................2-34 2.3.4.3.9 floating-point load instructions............................................................2-34 2.3.4.3.10 floating-point store instructions ...........................................................2-34 2.3.4.4 branch and flow control instructions .......................................................2-35 2.3.4.4.1 branch instruction address calculation.................................................2-36 2.3.4.4.2 branch instructions ................................................................................2-36 2.3.4.4.3 condition register logical instructions ................................................2-36 2.3.4.5 trap instructions.........................................................................................2-37 2.3.4.6 processor control instructions ...................................................................2-37 2.3.4.6.1 move to/from condition register instructions ......................................2-38 2.3.4.7 memory synchronization instructions?isa..........................................2-38 2.3.5 powerpc vea instructions............................................................................2-39 2.3.5.1 processor control instructions ...................................................................2-39 2.3.5.2 memory synchronization instructions?ea ...........................................2-40 2.3.5.3 memory control instructions?ea .........................................................2-41 2.3.5.4 external control instructions .....................................................................2-42 2.3.6 powerpc oea instructions............................................................................2-42 2.3.6.1 system linkage instructions ......................................................................2-42 2.3.6.2 processor control instructions?ea .......................................................2-42 2.3.6.2.1 move to/from machine state register instructions ...............................2-43 2.3.6.2.2 move to/from special-purpose register instructions ............................2-43 2.3.6.3 memory control instructions?ea .........................................................2-44 2.3.6.3.1 supervisor-level cache management instruction .................................2-44 2.3.6.3.2 segment register manipulation instructions .........................................2-45 2.3.6.3.3 translation lookaside buffer management instructions.......................2-45 2.3.7 recommended simplified mnemonics ..........................................................2-46 2.3.8 implementation-specific instructions ............................................................2-46
motorola contents vii contents paragraph number title page number chapter 3 instruction and data cache operation 3.1 instruction cache organization and control........................................................3-3 3.1.1 instruction cache organization........................................................................3-3 3.1.2 instruction cache fill operations ....................................................................3-4 3.1.3 instruction cache control ................................................................................3-4 3.1.3.1 instruction cache invalidation .....................................................................3-4 3.1.3.2 instruction cache disabling.........................................................................3-4 3.1.3.3 instruction cache locking ...........................................................................3-4 3.2 data cache organization and control .................................................................3-5 3.2.1 data cache organization .................................................................................3-5 3.2.2 data cache fill operations ..............................................................................3-5 3.2.3 data cache control ..........................................................................................3-6 3.2.3.1 data cache invalidation...............................................................................3-6 3.2.3.2 data cache disabling...................................................................................3-6 3.2.3.3 data cache locking.....................................................................................3-6 3.2.3.4 data cache operations and address broadcasts .........................................3-7 3.2.4 data cache touch load support .....................................................................3-7 3.3 basic data cache operations...............................................................................3-8 3.3.1 data cache fill.................................................................................................3-8 3.3.2 data cache cast-out operation.......................................................................3-8 3.3.3 cache block push operation ...........................................................................3-8 3.4 data cache transactions on bus..........................................................................3-8 3.4.1 single-beat transactions .................................................................................3-8 3.4.2 burst transactions............................................................................................3-8 3.4.3 access to direct-store segments .....................................................................3-9 3.5 memory management/cache access mode bits?, i, m, and g...................3-10 3.5.1 write-through attribute (w).........................................................................3-11 3.5.2 caching-inhibited attribute (i) ......................................................................3-11 3.5.3 memory coherency attribute (m) .................................................................3-12 3.5.4 guarded attribute (g) ....................................................................................3-12 3.5.5 w, i, and m bit combinations .......................................................................3-13 3.5.5.1 out-of-order execution and guarded memory.........................................3-13 3.5.5.2 effects of out-of-order data accesses .....................................................3-14 3.5.5.3 effects of out-of-order instruction fetches ..............................................3-14 3.6 cache coherency?ei protocol......................................................................3-15 3.6.1 mei state definitions ....................................................................................3-15 3.6.2 mei state diagram ........................................................................................3-16 3.6.3 mei hardware considerations.......................................................................3-17 3.6.4 coherency precautions...................................................................................3-18 3.6.4.1 coherency in single-processor systems....................................................3-18 3.6.5 load and store coherency summary ............................................................3-18
viii mpc603e & EC603E risc microprocessors user's manual motorola contents paragraph number title page number 3.6.6 atomic memory references...........................................................................3-19 3.6.7 cache reaction to specific bus operations...................................................3-19 3.6.8 operations causing ar tr y assertion ..........................................................3-21 3.6.9 enveloped high-priority cache block push operation .................................3-21 3.7 cache control instructions .................................................................................3-22 3.7.1 data cache block invalidate (dcbi) instruction .............................................3-23 3.7.2 data cache block touch ( dcbt ) instruction ..................................................3-23 3.7.3 data cache block touch for store ( dcbtst ) instruction ................................3-24 3.7.4 data cache block clear to zero ( dcbz ) instruction.......................................3-24 3.7.5 data cache block store ( dcbst ) instruction ..................................................3-24 3.7.6 data cache block flush ( dcbf ) instruction....................................................3-24 3.7.7 enforce in-order execution of i/o instruction ( eieio )...................................3-25 3.7.8 instruction cache block invalidate ( icbi ) instruction ....................................3-25 3.7.9 instruction synchronize ( isync ) instruction ...................................................3-25 3.8 bus operations caused by cache control instructions......................................3-25 3.9 bus interface.......................................................................................................3-27 3.10 mei state transactions ......................................................................................3-28 chapter 4 exceptions 4.1 exception classes .................................................................................................4-2 4.1.1 exception priorities ..........................................................................................4-7 4.1.2 summary of front-end exception handling....................................................4-9 4.2 exception processing..........................................................................................4-10 4.2.1 enabling and disabling exceptions................................................................4-14 4.2.2 steps for exception processing ......................................................................4-15 4.2.3 setting msr[ri].............................................................................................4-15 4.2.4 returning from an exception handler ...........................................................4-16 4.3 process switching...............................................................................................4-16 4.4 exception latencies............................................................................................4-17 4.5 exception definitions .........................................................................................4-17 4.5.1 reset exceptions (0x00100)...........................................................................4-18 4.5.1.1 hard reset and power-on reset ................................................................4-19 4.5.1.2 soft reset ...................................................................................................4-20 4.5.2 machine check exception (0x00200) ............................................................4-21 4.5.2.1 machine check exception enabled (msr[me] = 1) ................................4-22 4.5.2.2 checkstop state (msr[me] = 0) ...............................................................4-22 4.5.3 dsi exception (0x00300)...............................................................................4-23 4.5.4 isi exception (0x00400) ................................................................................4-25 4.5.5 external interrupt (0x00500)..........................................................................4-25 4.5.6 alignment exception (0x00600) ....................................................................4-26
motorola contents ix contents paragraph number title page number 4.5.6.1 integer alignment exceptions ...................................................................4-27 4.5.6.1.1 page address translation access ..........................................................4-28 4.5.6.2 floating-point alignment exceptions........................................................4-28 4.5.7 program exception (0x00700) .......................................................................4-29 4.5.7.1 ieee floating-point exception program exceptions ................................4-30 4.5.7.2 illegal, reserved, and unimplemented instructions program exceptions ...............................................................................4-30 4.5.8 floating-point unavailable exception (0x00800) .........................................4-31 4.5.9 decrementer exception (0x00900) ................................................................4-31 4.5.10 system call exception (0x00c00).................................................................4-31 4.5.11 trace exception (0x00d00)...........................................................................4-32 4.5.11.1 single-step instruction trace mode ..........................................................4-33 4.5.11.2 branch trace mode....................................................................................4-33 4.5.12 instruction tlb miss exception (0x01000) ..................................................4-33 4.5.13 data tlb miss on load exception (0x01100)..............................................4-34 4.5.14 data tlb miss on store exception (0x01200)..............................................4-35 4.5.15 instruction address breakpoint exception (0x01300)...................................4-35 4.5.16 system management interrupt (0x01400) .....................................................4-37 chapter 5 memory management 5.1 mmu features .....................................................................................................5-2 5.1.1 memory addressing.........................................................................................5-3 5.1.2 mmu organization..........................................................................................5-3 5.1.3 address translation mechanisms ....................................................................5-8 5.1.4 memory protection facilities.........................................................................5-10 5.1.5 page history information...............................................................................5-11 5.1.6 general flow of mmu address translation .................................................5-11 5.1.6.1 real addressing mode and block address translation selection ............5-11 5.1.6.2 page address translation selection...........................................................5-12 5.1.7 mmu exceptions summary ..........................................................................5-14 5.1.8 mmu instructions and register summary ....................................................5-17 5.2 real addressing mode .......................................................................................5-20 5.3 block address translation .................................................................................5-20 5.4 memory segment model....................................................................................5-21 5.4.1 page history recording .................................................................................5-21 5.4.1.1 referenced bit............................................................................................5-22 5.4.1.2 changed bit................................................................................................5-23 5.4.1.3 scenarios for referenced and changed bit recording..............................5-23 5.4.2 page memory protection................................................................................5-25 5.4.3 tlb description.............................................................................................5-25
x mpc603e & EC603E risc microprocessors user's manual motorola contents paragraph number title page number 5.4.3.1 tlb organization.......................................................................................5-25 5.4.3.2 tlb entry invalidation ..............................................................................5-27 5.4.4 page address translation summary ..............................................................5-28 5.5 page table search operation .............................................................................5-30 5.5.1 page table search operation?onceptual flow..........................................5-30 5.5.2 implementation-specific table search operation .........................................5-33 5.5.2.1 resources for table search operations .....................................................5-34 5.5.2.1.1 data and instruction tlb miss address registers (dmiss and imiss)...............................................................................5-36 5.5.2.1.2 data and instruction tlb compare registers (dcmp and icmp).......5-37 5.5.2.1.3 primary and secondary hash address registers (hash1 and hash2)............................................................................5-37 5.5.2.1.4 required physical address register (rpa)...........................................5-38 5.5.2.2 software table search operation...............................................................5-38 5.5.2.2.1 flow for example exception handlers ..................................................5-39 5.5.2.2.2 code for example exception handlers ..................................................5-44 5.5.3 page table updates ........................................................................................5-50 5.5.4 segment register updates..............................................................................5-50 chapter 6 instruction timing 6.1 terminology and conventions .............................................................................6-1 6.2 instruction timing overview ...............................................................................6-3 6.3 timing considerations .........................................................................................6-5 6.3.1 general instruction flow..................................................................................6-6 6.3.2 instruction fetch timing ..................................................................................6-9 6.3.2.1 cache arbitration .........................................................................................6-9 6.3.2.2 cache hit ......................................................................................................6-9 6.3.2.3 cache miss .................................................................................................6-10 6.3.3 instruction dispatch and completion considerations....................................6-11 6.3.3.1 rename register operation........................................................................6-12 6.3.3.2 instruction serialization .............................................................................6-13 6.3.3.3 execution unit considerations...................................................................6-14 6.4 execution unit timings......................................................................................6-14 6.4.1 branch processing unit execution timing ....................................................6-14 6.4.1.1 branch folding ...........................................................................................6-14 6.4.1.2 static branch prediction .............................................................................6-16 6.4.1.2.1 predicted branch timing examples.......................................................6-16 6.4.2 integer unit execution timing.......................................................................6-18 6.4.3 floating-point unit execution timing...........................................................6-18 6.4.4 load/store unit execution timing ................................................................6-18
motorola contents xi contents paragraph number title page number 6.4.5 system register unit execution timing........................................................6-18 6.5 memory performance considerations................................................................6-18 6.5.1 copy-back mode ...........................................................................................6-19 6.5.2 write-through mode .....................................................................................6-19 6.5.3 cache-inhibited accesses ..............................................................................6-20 6.6 instruction scheduling guidelines .....................................................................6-20 6.6.1 branch, dispatch, and completion unit resource requirements .................6-21 6.6.1.1 branch resolution resource requirements...............................................6-21 6.6.1.2 dispatch unit resource requirements ......................................................6-21 6.6.1.3 completion unit resource requirements..................................................6-22 6.7 instruction latency summary ............................................................................6-22 chapter 7 signal descriptions 7.1 signal configuration ............................................................................................7-3 7.2 signal descriptions ..............................................................................................7-4 7.2.1 address bus arbitration signals......................................................................7-4 7.2.1.1 bus request ( br )?utput..........................................................................7-4 7.2.1.2 bus grant ( bg )?nput................................................................................7-5 7.2.1.3 address bus busy ( abb ) ............................................................................7-5 7.2.1.3.1 address bus busy ( abb )?utput .........................................................7-5 7.2.1.3.2 address bus busy ( abb )?nput ............................................................7-6 7.2.2 address transfer start signals.........................................................................7-6 7.2.2.1 transfer start ( ts ) .......................................................................................7-6 7.2.2.1.1 transfer start ( ts )?utput ....................................................................7-6 7.2.2.1.2 transfer start ( ts )?nput.......................................................................7-7 7.2.3 address transfer signals .................................................................................7-7 7.2.3.1 address bus (a[0?1]) ................................................................................7-7 7.2.3.1.1 address bus (a[0?1])?utput .............................................................7-7 7.2.3.1.2 address bus (a[0?1])?nput................................................................7-7 7.2.3.2 address bus parity (ap[0?]) .....................................................................7-8 7.2.3.2.1 address bus parity (ap[0?])?utput ..................................................7-8 7.2.3.2.2 address bus parity (ap[0?])?nput.....................................................7-8 7.2.3.3 address parity error ( ape )?utput..........................................................7-8 7.2.4 address transfer attribute signals..................................................................7-9 7.2.4.1 transfer type (tt[0?])..............................................................................7-9 7.2.4.1.1 transfer type (tt[0?])?utput...........................................................7-9 7.2.4.1.2 transfer type (tt[0?])?nput .............................................................7-9 7.2.4.2 transfer size (tsiz[0?])?utput ..........................................................7-12 7.2.4.3 transfer burst ( tbst ) ...............................................................................7-13 7.2.4.3.1 transfer burst ( tbst )?utput ............................................................7-13
xii mpc603e & EC603E risc microprocessors user's manual motorola contents paragraph number title page number 7.2.4.3.2 transfer burst ( tbst )?nput ...............................................................7-13 7.2.4.4 transfer code (tc[0?])?utput ............................................................7-14 7.2.4.5 cache inhibit ( ci )?utput .......................................................................7-14 7.2.4.6 write-through ( wt )?utput...................................................................7-14 7.2.4.7 global ( gbl )..............................................................................................7-15 7.2.4.7.1 global ( gbl )?utput ..........................................................................7-15 7.2.4.7.2 global ( gbl )?nput .............................................................................7-15 7.2.4.8 cache set entry (cse[0?])?utput .......................................................7-15 7.2.5 address transfer termination signals...........................................................7-15 7.2.5.1 address acknowledge ( aack )?nput....................................................7-16 7.2.5.2 address retry ( artry )............................................................................7-16 7.2.5.2.1 address retry ( artry )?utput.........................................................7-16 7.2.5.2.2 address retry ( artry )?nput ...........................................................7-17 7.2.6 data bus arbitration signals..........................................................................7-17 7.2.6.1 data bus grant ( dbg )?nput ..................................................................7-17 7.2.6.2 data bus write only ( dbwo )?nput .....................................................7-18 7.2.6.3 data bus busy ( dbb ) ................................................................................7-18 7.2.6.3.1 data bus busy ( dbb )?utput .............................................................7-18 7.2.6.3.2 data bus busy ( dbb )?nput................................................................7-18 7.2.7 data transfer signals .....................................................................................7-19 7.2.7.1 data bus (dh[0?1], dl[0?1]) ...............................................................7-19 7.2.7.1.1 data bus (dh[0?1], dl[0?1])?utput ............................................7-19 7.2.7.1.2 data bus (dh[0?1], dl[0?1])?nput...............................................7-20 7.2.7.2 data bus parity (dp[0?]) .........................................................................7-20 7.2.7.2.1 data bus parity (dp[0?])?utput ......................................................7-20 7.2.7.2.2 data bus parity (dp[0?])?nput.........................................................7-20 7.2.7.3 data parity error ( dpe )?utput..............................................................7-21 7.2.7.4 data bus disable ( dbdis )?nput............................................................7-21 7.2.8 data transfer termination signals ................................................................7-21 7.2.8.1 transfer acknowledge ( ta )?nput..........................................................7-22 7.2.8.2 data retry ( drtry )?nput .....................................................................7-22 7.2.8.3 transfer error acknowledge ( tea )?nput..............................................7-23 7.2.9 system status signals.....................................................................................7-23 7.2.9.1 interrupt ( int )?nput...............................................................................7-23 7.2.9.2 system management interrupt ( smi )?nput ............................................7-24 7.2.9.3 machine check interrupt ( mcp )?nput...................................................7-24 7.2.9.4 checkstop input ( ckstp _in )?nput ......................................................7-24 7.2.9.5 checkstop output ( c kstp_out )?utput.............................................7-25 7.2.9.6 reset signals ..............................................................................................7-25 7.2.9.6.1 hard reset ( hreset )?nput...............................................................7-25 7.2.9.6.2 soft reset ( sreset )?nput .................................................................7-26 7.2.9.7 processor status signals .............................................................................7-26 7.2.9.7.1 quiescent request ( qreq ) ...................................................................7-26
motorola contents xiii contents paragraph number title page number 7.2.9.7.2 quiescent acknowledge ( qack ).........................................................7-26 7.2.9.7.3 reservation ( rsrv )?utput ...............................................................7-27 7.2.9.7.4 time base enable (tben)?nput........................................................7-27 7.2.9.7.5 tlbi sync ( tlbisync ) ......................................................................7-27 7.2.10 cop/scan interface........................................................................................7-28 7.2.11 pipeline tracking support..............................................................................7-28 7.2.12 clock signals .................................................................................................7-29 7.2.12.1 system clock (sysclk)?nput ..............................................................7-30 7.2.12.2 test clock (clk_out)?utput .............................................................7-30 7.2.12.3 pll configuration (pll_cfg[0?])?nput ...........................................7-30 7.2.13 power and ground signals.............................................................................7-32 chapter 8 system interface operation 8.1 overview ..............................................................................................................8-1 8.1.1 operation of the instruction and data caches .................................................8-2 8.1.2 operation of the system interface....................................................................8-4 8.1.2.1 optional 32-bit data bus mode ..................................................................8-5 8.1.3 direct-store accesses ......................................................................................8-6 8.2 memory access protocol .....................................................................................8-6 8.2.1 arbitration signals ...........................................................................................8-7 8.2.2 address pipelining and split-bus transactions...............................................8-8 8.3 address bus tenure .............................................................................................8-9 8.3.1 address bus arbitration...................................................................................8-9 8.3.2 address transfer ............................................................................................8-11 8.3.2.1 address bus parity.....................................................................................8-13 8.3.2.2 address transfer attribute signals............................................................8-13 8.3.2.2.1 transfer type (tt[0?]) signals...........................................................8-13 8.3.2.2.2 transfer size (tsiz[0?]) signals.........................................................8-13 8.3.2.3 burst ordering during data transfers.......................................................8-14 8.3.2.4 effect of alignment in data transfers (64-bit bus)..................................8-15 8.3.2.5 effect of alignment in data transfers (32-bit bus)..................................8-17 8.3.2.5.1 alignment of external control instructions...........................................8-19 8.3.2.6 transfer code (tc[0?]) signals ..............................................................8-20 8.3.3 address transfer termination ......................................................................8-20 8.4 data bus tenure.................................................................................................8-22 8.4.1 data bus arbitration ......................................................................................8-22 8.4.1.1 using the dbb signal ................................................................................8-23 8.4.2 data bus write only......................................................................................8-24 8.4.3 data transfer..................................................................................................8-24 8.4.4 data transfer termination.............................................................................8-25
xiv mpc603e & EC603E risc microprocessors user's manual motorola contents paragraph number title page number 8.4.4.1 normal single-beat termination ...............................................................8-26 8.4.4.2 data transfer termination due to a bus error ..........................................8-29 8.4.5 memory coherency?ei protocol ..............................................................8-30 8.5 timing examples................................................................................................8-32 8.6 optional bus configurations ..............................................................................8-38 8.6.1 32-bit data bus mode....................................................................................8-38 8.6.2 no- drtry mode ..........................................................................................8-40 8.6.3 reduced-pinout mode ....................................................................................8-40 8.7 interrupt, checkstop, and reset signals.............................................................8-41 8.7.1 external interrupts ..........................................................................................8-41 8.7.2 checkstops......................................................................................................8-41 8.7.3 reset inputs ....................................................................................................8-41 8.7.4 system quiesce control signals ....................................................................8-42 8.8 processor state signals.......................................................................................8-42 8.8.1 support for the lwarx/stwcx. instruction pair................................................8-42 8.8.2 tlbisync input ...........................................................................................8-42 8.9 ieee 1149.1-compliant interface ......................................................................8-43 8.9.1 ieee 1149.1 interface description.................................................................8-43 8.10 using data bus write only................................................................................8-43 chapter 9 power management 9.1 dynamic power management ..............................................................................9-1 9.2 programmable power modes................................................................................9-1 9.2.1 power management modes ..............................................................................9-3 9.2.1.1 full-power mode with dpm disabled.........................................................9-3 9.2.1.2 full-power mode with dpm enabled..........................................................9-3 9.2.1.3 doze mode ...................................................................................................9-4 9.2.1.4 nap mode .....................................................................................................9-4 9.2.1.5 sleep mode...................................................................................................9-5 9.2.2 power management software considerations..................................................9-6 appendix a powerpc instruction set listings a.1 instructions sorted by mnemonic........................................................................a-1 a.2 instructions sorted by opcode ............................................................................a-9 a.3 instructions grouped by functional categories ................................................a-17 a.4 instructions sorted by form ..............................................................................a-28 a.5 instruction set legend.......................................................................................a-39
motorola contents xv contents paragraph number title page number appendix b instructions not implemented appendix c powerpc 603 processor system design and programming considerations c.1 powerpc 603 microprocessor hardware considerations................................... c-1 c.1.1 hardware support for direct-store accesses ................................................. c-1 c.1.1.1 extended address transfer start ( xats ) .................................................. c-2 c.1.1.1.1 extended address transfer start ( xats )?utput ............................... c-2 c.1.1.1.2 extended address transfer start ( xats )?nput.................................. c-2 c.1.2 direct-store protocol operation ..................................................................... c-2 c.1.2.1 direct-store transactions ........................................................................... c-4 c.1.2.1.1 store operations...................................................................................... c-5 c.1.2.1.2 load operations...................................................................................... c-5 c.1.2.2 direct-store transaction protocol details .................................................. c-6 c.1.2.2.1 packet 0 ................................................................................................... c-7 c.1.2.2.2 packet 1 ................................................................................................... c-8 c.1.2.3 i/o reply operations .................................................................................. c-8 c.1.2.4 direct-store operation timing ................................................................. c-10 c.1.3 cse signal .................................................................................................... c-12 c.1.4 powerpc 603 processor bus clock multiplier configuration...................... c-12 c.1.5 powerpc 603 processor cache organization ............................................... c-13 c.1.5.1 instruction cache organization ................................................................ c-14 c.1.5.2 data cache organization .......................................................................... c-14 c.1.6 pll configuration (pll_cfg[0?])?nput............................................... c-15 c.1.7 address pipelining and split-bus transactions............................................ c-15 c.1.8 data bus arbitration ..................................................................................... c-16 c.2 powerpc 603 processor software considerations............................................ c-16 c.2.1 direct-store interface address translation .................................................. c-16 c.2.1.1 direct-store segment translation summary flow ................................... c-17 c.2.1.2 direct-store interface accesses ................................................................ c-18 c.2.1.3 direct-store segment protection .............................................................. c-18 c.2.1.4 instructions not supported in direct-store segments .............................. c-19 c.2.1.5 instructions with no effect in direct-store segments.............................. c-19 c.2.2 store instruction latency .............................................................................. c-19 c.2.3 instruction execution by system register unit ............................................ c-20 c.2.4 machine check exception (0x00200)........................................................... c-21 c.2.5 instruction address breakpoint exception (0x01400).................................. c-21 c.2.6 cache control instructions............................................................................ c-21
xvi mpc603e & EC603E risc microprocessors user's manual motorola contents paragraph number title page number glossary of terms and abbreviations index
motorola illustrations xvii illustrations figure number title page number 1-1 block diagram .................................................................................................... 1-6 1-2 programming model?egisters ...................................................................... 1-22 1-3 data cache organization .................................................................................. 1-27 1-4 exception classifications.................................................................................. 1-29 1-5 exceptions and conditions ............................................................................... 1-29 1-6 system interface................................................................................................ 1-35 1-7 signal groups.................................................................................................... 1-38 2-1 programming model?egisters ........................................................................ 2-3 2-2 hardware implementation register 0 (hid0) .................................................... 2-7 2-3 hardware implementation register 1 (hid1) .................................................... 2-9 2-4 dmiss and imiss registers .............................................................................. 2-9 2-5 dcmp and icmp registers.............................................................................. 2-10 2-6 hash1 and hash2 registers ......................................................................... 2-10 2-7 required physical address register (rpa) ..................................................... 2-11 2-8 instruction address breakpoint register (iabr)............................................. 2-11 3-1 instruction cache organization .......................................................................... 3-3 3-2 data cache organization .................................................................................... 3-5 3-3 double-word address ordering?ritical double word first.......................... 3-9 3-4 mei cache coherency protocol?tate diagram (wim = 001)...................... 3-16 3-5 bus interface address buffers .......................................................................... 3-28 4-1 exceptions and conditions ................................................................................. 4-4 4-2 machine status save/restore register 0 .......................................................... 4-10 4-3 machine status save/restore register 1 .......................................................... 4-10 4-4 machine state register (msr) ......................................................................... 4-12 5-1 mmu conceptual block diagram?2-bit implementations............................ 5-5 5-2 immu block diagram........................................................................................ 5-6 5-3 dmmu block diagram ...................................................................................... 5-7 5-4 address translation types ................................................................................. 5-9 5-5 general flow of address translation (real addressing mode and block) ..... 5-12 5-6 general flow of page and direct-store interface address translation ........... 5-13 5-7 segment register and tlb organization ......................................................... 5-26 5-8 page address translation flow for 32-bit implementations?lb hit.......... 5-29 5-9 primary page table search?onceptual flow ............................................... 5-32 5-10 secondary page table search flow?onceptual flow .................................. 5-33 5-11 derivation of key bit for srr1 ....................................................................... 5-36 5-12 dmiss and imiss registers ............................................................................ 5-36 5-13 dcmp and icmp registers.............................................................................. 5-37
xviii mpc603e & EC603E risc microprocessors users manual motorola illustrations figure number title page number 5-14 hash1 and hash2 registers ......................................................................... 5-37 5-15 required physical address (rpa) register ..................................................... 5-38 5-16 flow for example software table search operation ....................................... 5-40 5-17 check and set r and c bit flow ...................................................................... 5-41 5-18 page fault setup flow ...................................................................................... 5-42 5-19 setup for protection violation exceptions ....................................................... 5-43 6-1 pipelined execution unit .................................................................................... 6-4 6-2 instruction flow diagram ................................................................................... 6-8 6-3 instruction timing?ache hit ........................................................................ 6-10 6-4 instruction timing?ache miss...................................................................... 6-11 6-5 branch instruction timing................................................................................ 6-17 7-1 signal groups...................................................................................................... 7-3 7-2 ieee 1149.1-compliant boundary scan interface........................................... 7-28 8-1 block diagram .................................................................................................... 8-3 8-2 timing diagram legend..................................................................................... 8-5 8-3 overlapping tenures on the bus for a single-beat transfer.............................. 8-6 8-4 address bus arbitration ................................................................................... 8-10 8-5 address bus arbitration showing bus parking................................................ 8-11 8-6 address bus transfer........................................................................................ 8-12 8-7 snooped address cycle with artry ............................................................. 8-22 8-8 data bus arbitration ......................................................................................... 8-23 8-9 normal single-beat read termination ............................................................ 8-26 8-10 normal single-beat write termination............................................................ 8-27 8-11 normal burst transaction................................................................................. 8-27 8-12 termination with drtry ................................................................................ 8-28 8-13 read burst with ta wait states and drtry .................................................. 8-29 8-14 mei cache coherency protocol?tate diagram (wim = 001)...................... 8-31 8-15 fastest single-beat reads................................................................................. 8-32 8-16 fastest single-beat writes................................................................................ 8-33 8-17 single-beat reads showing data-delay controls ........................................... 8-34 8-18 single-beat writes showing data delay controls........................................... 8-35 8-19 burst transfers with data delay controls........................................................ 8-36 8-20 use of transfer error acknowledge (tea ) ..................................................... 8-37 8-21 32-bit data bus transfer (eight-beat burst) ................................................... 8-39 8-22 32-bit data bus transfer (two-beat burst with drtry ) .............................. 8-39 8-23 data bus write only transaction..................................................................... 8-44 c-1 direct-store tenures ...........................................................................................c-4 c-2 direct-store operation?acket 0 ......................................................................c-7 c-3 direct-store operation?acket 1 ......................................................................c-8 c-4 i/o reply operation ............................................................................................c-9 c-5 direct-store interface load access example ...................................................c-11 c-6 direct-store interface store access example ...................................................c-12 c-7 instruction cache organization .........................................................................c-14
motorola illustrations xix illustrations figure number title page number c-8 data cache organization ..................................................................................c-15 c-9 direct-store segment translation flow ............................................................c-17
xx mpc603e & EC603E risc microprocessors users manual motorola illustrations figure number title page number
motorola tables xxi tables table number title page number i acronyms and abbreviated terms .................................................................. xxxiv ii terminology conventions .............................................................................. xxxvii iii instruction field conventions........................................................................ xxxviii 1-1 cse[0?] signals................................................................................................. 1-7 1-2 generated srr1 [key] bit .................................................................................. 1-8 1-3 additional/changed hid0 bits.......................................................................... 1-18 2-1 msr[pow] and msr[tgpr] bits ..................................................................... 2-5 2-2 hid0 bit settings................................................................................................. 2-8 2-3 hid1 bit settings................................................................................................. 2-9 2-4 dcmp and icmp bit settings........................................................................... 2-10 2-5 hash1 and hash2 bit settings ...................................................................... 2-10 2-6 rpa bit settings ................................................................................................ 2-11 2-7 instruction address breakpoint register bit settings ....................................... 2-12 2-8 memory operands ............................................................................................. 2-13 2-9 integer arithmetic instructions .......................................................................... 2-22 2-10 integer compare instructions............................................................................. 2-23 2-11 integer logical instructions ............................................................................... 2-23 2-12 integer rotate instructions................................................................................. 2-24 2-13 integer shift instructions.................................................................................... 2-25 2-14 floating-point arithmetic instructions .............................................................. 2-26 2-15 floating-point multiply-add instructions ......................................................... 2-26 2-16 floating-point rounding and conversion instructions...................................... 2-27 2-17 floating-point compare instructions................................................................. 2-27 2-18 floating-point status and control register instructions ................................... 2-28 2-19 floating-point move instructions ...................................................................... 2-28 2-20 integer load instructions ................................................................................... 2-30 2-21 integer store instructions................................................................................... 2-31 2-22 integer load and store with byte-reverse instructions .................................... 2-31 2-23 integer load and store multiple instructions .................................................... 2-32 2-24 integer load and store string instructions ........................................................ 2-33 2-25 floating-point load instructions ....................................................................... 2-34 2-26 floating-point store instructions ....................................................................... 2-35 2-27 branch instructions ............................................................................................ 2-36 2-28 condition register logical instructions ............................................................ 2-37 2-29 trap instructions ................................................................................................ 2-37 2-30 move to/from condition register instructions .................................................. 2-38
xxii mpc603e & EC603E risc microprocessors users manual motorola tables table number title page number 2-31 memory synchronization instructions?isa ................................................. 2-39 2-32 move from time base instruction..................................................................... 2-40 2-33 memory synchronization instructions?ea .................................................. 2-40 2-34 user-level cache instructions........................................................................... 2-41 2-35 external control instructions............................................................................. 2-42 2-36 system linkage instructions.............................................................................. 2-42 2-37 move to/from machine state register instructions ........................................... 2-43 2-38 move to/from special-purpose register instructions ........................................ 2-43 2-39 implementation-specific spr encodings (mfspr) ............................................. 2-43 2-40 supervisor-level cache management instruction............................................. 2-44 2-41 segment register manipulation instructions..................................................... 2-45 2-42 translation lookaside buffer management instructions .................................. 2-46 3-1 combinations of w, i, and m bits..................................................................... 3-13 3-2 mei state definitions ........................................................................................ 3-16 3-3 cse[0?] signal encoding ................................................................................ 3-18 3-4 memory coherency actions on load operations ............................................. 3-19 3-5 memory coherency actions on store operations ............................................. 3-19 3-6 response to bus transactions ........................................................................... 3-20 3-7 bus operations caused by cache control instructions (wim = 001) .............. 3-26 3-8 mei state transitions ........................................................................................ 3-28 4-1 exception classifications..................................................................................... 4-3 4-2 exception priorities.............................................................................................. 4-7 4-3 srr1 bit settings for machine check exceptions............................................ 4-11 4-4 srr1 bit settings for software table search operations................................. 4-11 4-5 msr bit settings ............................................................................................... 4-12 4-6 ieee floating-point exception mode bits........................................................ 4-14 4-7 msr setting due to exception.......................................................................... 4-17 4-8 settings caused by hard reset .......................................................................... 4-19 4-9 soft reset exception?egister settings........................................................... 4-20 4-10 machine check exception?egister settings.................................................. 4-22 4-11 dsi exception?egister settings..................................................................... 4-24 4-12 external interrupt?egister settings................................................................ 4-26 4-13 alignment interrupt?egister settings ............................................................ 4-27 4-14 access types ..................................................................................................... 4-28 4-15 trace exception?egister settings .................................................................. 4-32 4-16 instruction and data tlb miss exceptions?egister settings........................ 4-34 4-17 instruction address breakpoint exception?egister settings......................... 4-35 4-18 breakpoint action for multiple modes enabled for the same address............ 4-36 4-19 system management interrupt?egister settings............................................ 4-37 5-1 mmu features summary .................................................................................... 5-2 5-2 access protection options for pages ................................................................. 5-10 5-3 translation exception conditions...................................................................... 5-15 5-4 other mmu exception conditions.................................................................... 5-16
motorola tables xxiii tables table number title page number 5-5 instruction summary?mu control .............................................................. 5-18 5-6 mmu registers.................................................................................................. 5-18 5-7 table search operations to update history bits?lb hit case .................... 5-22 5-8 model for guaranteed r and c bit settings ...................................................... 5-24 5-9 implementation-specific resources for table search operations .................... 5-34 5-10 implementation-specific srr1 bits.................................................................. 5-36 5-11 dcmp and icmp bit settings........................................................................... 5-37 5-12 hash1 and hash2 bit settings ...................................................................... 5-38 5-13 rpa bit settings ................................................................................................ 5-38 6-1 branch instructions ............................................................................................ 6-23 6-2 system register instructions.............................................................................. 6-23 6-3 condition register logical instructions ............................................................ 6-24 6-4 integer instructions ............................................................................................ 6-24 6-5 floating-point instructions................................................................................. 6-26 6-6 load and store instructions ............................................................................... 6-28 7-1 transfer encoding for the bus master................................................................. 7-9 7-2 snoop hit response........................................................................................... 7-11 7-3 implementation-specific transfer encoding..................................................... 7-12 7-4 clk_out signal configuration....................................................................... 7-12 7-5 data transfer size............................................................................................. 7-13 7-6 encodings for tc[0?] signals ......................................................................... 7-14 7-7 data bus lane assignments .............................................................................. 7-19 7-8 dp[0?] signal assignments............................................................................. 7-20 7-9 pipeline tracking outputs ................................................................................. 7-29 7-10 pll configuration ............................................................................................. 7-31 8-1 transfer size signal encodings ......................................................................... 8-14 8-2 burst ordering?4-bit bus .............................................................................. 8-14 8-3 burst ordering?2-bit bus .............................................................................. 8-15 8-4 aligned data transfers (64-bit bus)................................................................. 8-15 8-5 misaligned data transfers (four-byte examples) ............................................ 8-17 8-6 aligned data transfers (32-bit bus mode)....................................................... 8-18 8-7 misaligned 32-bit data bus transfer (four-byte examples) ........................... 8-19 8-8 transfer code encoding .................................................................................... 8-20 8-9 cse[0?] signals............................................................................................... 8-31 8-10 ieee interface pin descriptions ........................................................................ 8-43 9-1 programmable power modes............................................................................... 9-3 a-1 complete instruction list sorted by mnemonic................................................. a-1 a-2 complete instruction list sorted by opcode...................................................... a-9 a-3 integer arithmetic instructions ......................................................................... a-17 a-4 integer compare instructions............................................................................ a-18 a-5 integer logical instructions .............................................................................. a-18 a-6 integer rotate instructions................................................................................ a-18 a-7 integer shift instructions................................................................................... a-19
xxiv mpc603e & EC603E risc microprocessors users manual motorola tables table number title page number a-8 floating-point arithmetic instructions ............................................................. a-19 a-9 floating-point multiply-add instructions ........................................................ a-20 a-10 floating-point rounding and conversion instructions..................................... a-20 a-11 floating-point compare instructions................................................................ a-20 a-12 floating-point status and control register instructions .................................. a-20 a-13 integer load instructions .................................................................................. a-21 a-14 integer store instructions.................................................................................. a-22 a-15 integer load and store with byte-reverse instructions ................................... a-22 a-16 integer load and store multiple instructions ................................................... a-22 a-17 integer load and store string instructions ....................................................... a-23 a-18 memory synchronization instructions.............................................................. a-23 a-19 floating-point load instructions ...................................................................... a-23 a-20 floating-point store instructions ...................................................................... a-24 a-21 floating-point move instructions ..................................................................... a-24 a-22 branch instructions ........................................................................................... a-24 a-23 condition register logical instructions ........................................................... a-24 a-24 system linkage instructions............................................................................. a-25 a-25 trap instructions ............................................................................................... a-25 a-26 processor control instructions.......................................................................... a-25 a-27 cache management instructions....................................................................... a-26 a-28 segment register manipulation instructions.................................................... a-26 a-29 lookaside buffer management instructions..................................................... a-26 a-30 external control instructions............................................................................ a-27 a-31 i-form ............................................................................................................... a- 28 a-32 b-form.............................................................................................................. a-2 8 a-33 sc-form............................................................................................................ a-28 a-34 d-form.............................................................................................................. a-2 8 a-35 ds-form ........................................................................................................... a-30 a-36 x-form.............................................................................................................. a-3 0 a-37 xl-form ........................................................................................................... a-34 a-38 xfx-form......................................................................................................... a-35 a-39 xfl-form ......................................................................................................... a-35 a-40 xs-form ........................................................................................................... a-35 a-41 xo-form........................................................................................................... a-35 a-42 a-form.............................................................................................................. a-3 6 a-43 m-form ............................................................................................................. a-37 a-44 md-form .......................................................................................................... a-37 a-45 mds-form........................................................................................................ a-38 a-46 powerpc instruction set legend ...................................................................... a-39 b-1 32-bit instructions not implemented by the powerpc 603e...............................b-1 b-2 64-bit instructions not implemented ..................................................................b-1 b-3 floating-point instructions not supported by the EC603E microprocessor .......b-3 b-4 64-bit spr encoding not implemented..............................................................b-5
motorola tables xxv tables table number title page number c-1 direct-store bus operations ................................................................................c-4 c-2 address bits for i/o reply operations................................................................c-9 c-3 cse signal encoding.........................................................................................c-12 c-4 powerpc 603 microprocessor pll configuration............................................c-13 c-5 store instruction timing ....................................................................................c-19 c-6 system register instructions..............................................................................c-20
xxvi mpc603e & EC603E risc microprocessors users manual motorola
motorola about this book xxvii about this book the primary objective of this users manual is to de?e the functionality of the powerpc 603 and powerpc 603e microprocessors for use by software and hardware developers. although the emphasis of this manual is upon the 603e, all of the information within applies to both the 603 and 603e, except for those differences noted in appendix c, ?owerpc 603 processor system design and programming considerations.?those readers who are primarily interested in the 603 should begin with appendix c. in addition, this book describes the EC603E microprocessor. the EC603E microprocessor for embedded systems is functionally equivalent to the 603e with the exception of the ?ating-point unit which is not supported on the EC603E microprocessor; therefore, the term ?c603e is used only when it is necessary to distinguish functional differences with the EC603E microprocessor. the 603e is built upon the low-power dissipation, low-cost and high-performance attributes of the 603 while providing the system designer additional capabilities through higher processor clock speeds, increases in cache size (16-kbyte instruction and data caches) and set-associativity (4-way), and greater system clock ?xibility. the 603e only implements the 32-bit portion of the powerpc architecture. the 603e and EC603E microprocessors are implemented in both a 2.5-volt version (pid 0007v 603e microprocessor, abbreviated as pid7v-603e) and a 3.3-volt version (pid 0006 603e microprocessor, abbreviated as pid6-603e). in this document, the term ?03e is used as an abbreviation for ?owerpc 603e microprocessor and the term ?03 is an abbreviation for ?owerpc 603 microprocessor? the powerpc 603e microprocessors are available from motorola as mpc603e. the EC603E microprocessors are available from motorola as mpe603e. it is important to note that this book is intended as a companion to the powerpc microprocessor family: the programming environments , referred to as the programming environments manual ; contact your local sales representative to obtain a copy. because the powerpc architecture is designed to be ?xible to support a broad range of processors, the programming environments manual provides a general description of features that are common to powerpc processors and indicates those features that are optional or that may be implemented differently in the design of each processor.
xxviii mpc603e & EC603E risc microprocessors user's manual motorola this document summarizes features of the 603e that are not de?ed by the architecture. this document and the programming environments manual distinguish between the three levels, or programming environments, of the powerpc architecture, which are as follows: powerpc user instruction set architecture (uisa)?he uisa de?es the level of the architecture to which user-level software should conform. the uisa de?es the base user-level instruction set, user-level registers, data types, memory conventions, and the memory and programming models seen by application programmers. powerpc virtual environment architecture (vea)?he vea, which is the smallest component of the powerpc architecture, de?es additional user-level functionality that falls outside typical user-level software requirements. the vea describes the memory model for an environment in which multiple processors or other devices can access external memory, de?es aspects of the cache model and cache control instructions from a user-level perspective. the resources de?ed by the vea are particularly useful for optimizing memory accesses and for managing resources in an environment in which other processors and other devices can access external memory. powerpc operating environment architecture (oea)?he oea de?es supervisor- level resources typically required by an operating system. the oea de?es the powerpc memory management model, supervisor-level registers, and the exception model. implementations that conform to the powerpc oea also conform to the powerpc uisa and vea. it is important to note that some resources are de?ed more generally at one level in the architecture and more speci?ally at another. for example, conditions that cause a ?ating- point exception are de?ed by the uisa, while the exception mechanism itself is de?ed by the oea. because it is important to distinguish between the levels of the architecture in order to ensure compatibility across multiple platforms, those distinctions are shown clearly throughout this book. for ease in reference, this book has arranged topics described by the architecture into topics that build upon one another, beginning with a description and complete summary of 603e- speci? registers and progressing to more specialized topics such as 603e-speci? details regarding the cache, exception, and memory management models. as such, chapters may include information from multiple levels of the architecture. (for example, the discussion of the cache model uses information from both the vea and the oea.) the powerpc architecture: a speci?ation for a new family of risc processors de?es the architecture from the perspective of the three programming environments and remains the de?ing document for the powerpc architecture. the information in this book is subject to change without notice, as described in the disclaimers on the title page of this book. as with any technical documentation, it is the
motorola about this book xxix readers responsibility to be sure they are using the most recent version of the documentation. for more information, contact your sales representative. audience this manual is intended for system software and hardware developers and applications programmers who want to develop products using the 603e microprocessors. it is assumed that the reader understands operating systems, microprocessor system design, the basic principles of risc processing, and details of the powerpc architecture. organization following is a summary and a brief description of the major sections of this manual: chapter 1, ?verview,?is useful for readers who want a general understanding of the features and functions of the powerpc architecture and the 603e. this chapter describes the ?xible nature of the powerpc architecture de?ition, and provides an overview of how the powerpc architecture de?es the register set, operand conventions, addressing modes, instruction set, cache model, exception model, and memory management model. chapter 2, ?rogramming model,?provides a brief synopsis of the registers implemented in the 603e, operand conventions, an overview of the powerpc addressing modes, and a list of the instructions implemented by the 603e. instructions are organized by function. chapter 3, ?nstruction and data cache operation,?provides a discussion of the cache and memory model as implemented on the 603e. chapter 4, ?xceptions,?describes the exception model de?ed in the powerpc oea and the speci? exception model implemented on the 603e. chapter 5, ?emory management,?describes the 603es implementation of the memory management unit speci?ations provided by the powerpc oea for powerpc processors. chapter 6, ?nstruction timing,?provides information about latencies, interlocks, special situations, and various conditions to help make programming more ef?ient. this chapter is of special interest to software engineers and system designers. chapter 7, ?ignal descriptions,?provides descriptions of individual signals of the 603e. chapter 8, ?ystem interface operation,?describes signal timings for various operations. it also provides information for interfacing to the 603e. chapter 9, ?ower management,?provides information about power saving modes for the 603e.
xxx mpc603e & EC603E risc microprocessors user's manual motorola appendix a, ?owerpc instruction set listings,?lists all the powerpc instructions while indicating those instructions that are not implemented by the 603e; it also includes the instructions that are speci? to the 603e. instructions are grouped according to mnemonic, opcode, function, and form. also included is a quick reference table that contains general information, such as the architecture level, privilege level, and form, and indicates if the instruction is 64-bit and optional. appendix b, ?nstructions not implemented,?provides a list of powerpc instructions not implemented by the 603e. appendix c, ?owerpc 603 processor system design and programming considerations,?provides a discussion of the hardware and software differences between the 603 and 603e. this manual also includes a glossary and an index. suggested reading this section lists additional reading that provides background for the information in this manual as well as general information about the powerpc architecture. general information the following documentation provides useful information about the powerpc architecture and computer architecture in general: the following books are available from the morgan-kaufmann publishers, 340 pine street, sixth floor, san francisco, ca 94104; tel. (800) 745-7323 (u.s.a.), (415) 392-2665 (international); internet address: mkp@mkp.com. the powerpc architecture: a speci?ation for a new family of risc processors , second edition, by international business machines, inc. updates to the architecture speci?ation are accessible via the world-wide web at http://www.austin.ibm.com/tech/ppc-chg.html. powerpc microprocessor common hardware reference platform: a system architecture , by apple computer, inc., international business machines, inc., and motorola, inc. macintosh technology in the common hardware reference platform , by apple computer, inc. computer architecture: a quantitative approach , second edition, by john l. hennessy and david a. patterson inside macintosh: powerpc system software, addison-wesley publishing company, one jacob way, reading, ma, 01867; tel. (800) 282-2732 (u.s.a.), (800) 637-0029 (canada), (716) 871-6555 (international). powerpc programming for intel programmers, by kip mcclanahan; idg books worldwide, inc., 919 east hillsdale boulevard, suite 400, foster city, ca, 94404; tel. (800) 434-3422 (u.s.a.), (415) 655-3022 (international).
motorola about this book xxxi powerpc documentation the powerpc documentation is available from the sources listed on the back cover of this manual; the document order numbers are included in parentheses for ease in ordering: users manuals?hese books provide details about individual powerpc implementations and are intended to be used in conjunction with the programming environments manual. these include the following: powerpc 604 risc microprocessor users manual : mpc604um/ad (motorola order #) mpc750 risc microprocessor users manual : mpc750um/ad (motorola order #) powerpc 620 risc microprocessor users manual : mpc620um/ad (motorola order #) programming environments manuals?hese books provide information about resources de?ed by the powerpc architecture that are common to powerpc processors. there are two versions, one that describes the functionality of the combined 32- and 64-bit architecture models and one that describes only the 32-bit model. powerpc microprocessor family: the programming environments , rev 1: mpcfpe/ad (motorola order #) powerpc microprocessor family: the programming environments for 32-bit microprocessors , rev. 1: mpcfpe32b/ad (motorola order #) implementation variances relative to rev. 1 of the programming environments manual is available via the world-wide web at http://www.motorola.com/powerpc/. addenda/errata to users manuals?ecause some processors have follow-on parts an addendum is provided that describes the additional features and changes to functionality of the follow-on part. these addenda are intended for use with the corresponding users manuals. these include the following: addendum to powerpc 604 risc microprocessor users manual : powerpc 604e microprocessor supplement and users manual errata : mpc604umad/ad (motorola order #) hardware speci?ations?ardware speci?ations provide speci? data regarding bus timing, signal behavior, and ac, dc, and thermal characteristics, as well as other design considerations for each powerpc implementation. these include the following: powerpc 603 risc microprocessor hardware speci?ations : mpc603ec/d (motorola order #) powerpc 603e risc microprocessor family: pid6-603e hardware speci?ations : mpc603eec/d (motorola order #)
xxxii mpc603e & EC603E risc microprocessors user's manual motorola powerpc 603e risc microprocessor family: pid7v-603e hardware speci?ations : mpc603e7vec/d (motorola order #) powerpc 603e risc microprocessor family: pid7t-603e hardware speci?ations : mpc603e7tec/d (motorola order #) powerpc 604 risc microprocessor hardware speci?ations : mpc604ec/d (motorola order #) powerpc 604e risc microprocessor family: pid9v-604e hardware speci?ations : mpc604e9vec/d (motorola order #) powerpc 604e risc microprocessor family: pid9q-604e hardware speci?ations : mpc604e9qec/d (motorola order #) mpc750 risc microprocessor hardware speci?ations mpc750ec/d (motorola order #) EC603E embedded risc microprocessor (pid6) hardware speci?ations : mpe603eec/d (motorola order #) EC603E embedded risc microprocessor (pid7v) hardware speci?ations : mpe603e7vec/d (motorola order #) technical summaries?ach powerpc implementation has a technical summary that provides an overview of its features. this document is roughly the equivalent to the overview (chapter 1) of an implementations users manual. technical summaries are available for the 601, 603, 603e, 604, 604e, and EC603E microprocessors which can be ordered as follows: EC603E embedded risc microprocessor technical summary : mpe603e/d (motorola order #) powerpc microprocessor family: the bus interface for 32-bit microprocessors : mpcbusif/ad (motorola order #) provides a detailed functional description of the 60x bus interface, as implemented on the 601, 603, and 604 family of powerpc microprocessors. this document is intended to help system and chipset developers by providing a centralized reference source to identify the bus interface presented by the 60x family of powerpc microprocessors. powerpc microprocessor family: the programmers reference guide : mpcprg/d (motorola order #) is a concise reference that includes the register summary, memory control model, exception vectors, and the powerpc instruction set. powerpc microprocessor family: the programmers pocket reference guide : mpcprgref/d (motorola order #) this foldout card provides an overview of the powerpc registers, instructions, and exceptions for 32-bit implementations.
motorola about this book xxxiii application notes?hese short documents contain useful information about speci? design issues useful to programmers and engineers working with powerpc processors. documentation for support chips?hese include the following: mpc105 pci bridge/memory controller users manual : mpc105um/ad (motorola order #) mpc106 pci bridge/memory controller users manual : mpc106um/ad (motorola order #) additional literature on powerpc implementations is being released as new processors become available. for a current list of powerpc documentation, refer to the world-wide web at http://www.mot.com/sps/powerpc/. conventions this document uses the following notational conventions: mnemonics instruction mnemonics are shown in lowercase bold. italics italics indicate variable command parameters, for example, bcctr x . book titles in text are set in italics. 0x0 pre? to denote hexadecimal number 0b0 pre? to denote binary number r a, r b instruction syntax used to identify a source gpr r a|0 the contents of a speci?d gpr or the value 0. r d instruction syntax used to identify a destination gpr fr a, fr b, fr c instruction syntax used to identify a source fpr fr d instruction syntax used to identify a destination fpr reg[field] abbreviations or acronyms for registers are shown in uppercase text. speci? bits, ?lds, or ranges appear in brackets. for example, msr[le] refers to the little-endian mode enable bit in the machine state register. x in certain contexts, such as a signal encoding, this indicates a dont care. n used to express an unde?ed numerical value not logical operator & and logical operator | or logical operator indicates reserved bits or bit ?lds in a register. although these bits may be written to as either ones or zeros, they are always read as zeros. 0 0 0 0
xxxiv mpc603e & EC603E risc microprocessors user's manual motorola acronyms and abbreviations table i contains acronyms and abbreviations that are used in this document. table i. acronyms and abbreviated terms term meaning alu arithmetic logic unit ate automatic test equipment asr address space register bat block address translation bist built-in self test biu bus interface unit bpu branch processing unit buc bus unit controller buid bus unit id car cache address register cia current instruction address cmos complementary metal-oxide semiconductor cop common on-chip processor cr condition register crtry cache retry queue ctr count register dar data address register dbat data bat dcmp data tlb compare dec decrementer register dmiss data tlb miss address dsisr register used for determining the source of a dsi exception dtlb data translation lookaside buffer ea effective address ear external access register ecc error checking and correction fifo first-in-?st-out fpr floating-point register (note that the EC603E microprocessor does not support the ?ating- point unit.) fpscr floating-point status and control register (note that the EC603E microprocessor does not support the ?ating-point unit.)
motorola about this book xxxv fpu floating-point unit (note that the EC603E microprocessor does not support the ?ating-point unit.) gpr general-purpose register hash1 primary hash address hash2 secondary hash address iabr instruction address breakpoint register ibat instruction bat icmp instruction tlb compare ieee institute for electrical and electronics engineers imiss instruction tlb miss address iq instruction queue itlb instruction translation lookaside buffer iu integer unit l2 secondary cache lifo last-in-?st-out lr link register lru least recently used lsb least-signi?ant byte lsb least-signi?ant bit lsu load/store unit mei modi?d/exclusive/invalid mesi modi?d/exclusive/shared/invalid?ache coherency protocol mmu memory management unit mq mq register msb most-signi?ant byte msb most-signi?ant bit msr machine state register nan not a number no-op no operation oea operating environment architecture pid processor identi?ation tag pir processor identi?ation register pll phase-locked loop table i. acronyms and abbreviated terms (continued) term meaning
xxxvi mpc603e & EC603E risc microprocessors user's manual motorola power performance optimized with enhanced risc architecture pte page table entry pteg page table entry group pvr processor version register raw read-after-write risc reduced instruction set computing rpa required physical address rtl register transfer language rwitm read with intent to modify sdr1 register that speci?s the page table base address for virtual-to-physical address translation slb segment lookaside buffer spr special-purpose register sr segment register srr0 machine status save/restore register 0 srr1 machine status save/restore register 1 sru system register unit tap test access port tb time base facility tbl time base lower register tbu time base upper register tlb translation lookaside buffer ttl transistor-to-transistor logic uimm unsigned immediate value uisa user instruction set architecture utlb uni?d translation lookaside buffer uut unit under test vea virtual environment architecture war write-after-read waw write-after-write wimg write-through/caching-inhibited/memory-coherency enforced/guarded bits xatc extended address transfer code xer register used for indicating conditions such as carries and over?ws for integer operations table i. acronyms and abbreviated terms (continued) term meaning
motorola about this book xxxvii terminology conventions table ii describes terminology conventions used in this manual. table ii. terminology conventions the architecture speci?ation this manual data storage interrupt (dsi) dsi exception extended mnemonics simpli?d mnemonics fixed-point unit (fxu) integer unit (iu) instruction storage interrupt (isi) isi exception interrupt exception privileged mode (or privileged state) supervisor-level privilege problem mode (or problem state) user-level privilege real address physical address relocation translation storage (locations) memory storage (the act of) access store in write back store through write through
xxxviii mpc603e & EC603E risc microprocessors user's manual motorola table iii describes instruction ?ld notation used in this manual. table iii. instruction field conventions the architecture speci?ation equivalent to: ba, bb, bt crb a, crb b, crb d (respectively) bf, bfa crf d, crf s (respectively) dd ds ds flm fm fra, frb, frc, frt, frs fr a, fr b, fr c, fr d, fr s (respectively) fxm crm ra, rb, rt, rs r a, r b, r d, r s (respectively) si simm u imm ui uimm /, //, /// 0...0 (shaded)
motorola chapter 1. overview 1-1 chapter 1 overview 10 10 this chapter provides an overview of features of the powerpc 603e microprocessor and the powerpc architecture, and information about how the 603e implementation complies with the architectural de?itions. in addition, this book describes the EC603E microprocessor. note that the 603e and EC603E microprocessors are implemented in both a 2.5-volt version (pid 0007v 603e microprocessor, abbreviated as pid7v-603e) and a 3.3-volt version (pid 0006 603e microprocessor, abbreviated as pid6-603e). 1.1 overview this section describes the details of the 603e, provides a block diagram showing the major functional units, and describes brie? how those units interact. any differences between the pid6-603e, pid7v-603e, and EC603E implementations are noted. the 603e is a low-power implementation of the powerpc microprocessor family of reduced instruction set computing (risc) microprocessors. the 603e implements the 32-bit portion of the powerpc architecture, which provides 32-bit effective addresses, integer data types of 8, 16, and 32 bits, and ?ating-point data types of 32 and 64 bits. the 603e is a superscalar processor that can issue and retire as many as three instructions per clock. instructions can execute out of order for increased performance; however, the 603e makes completion appear sequential. the 603e integrates ?e execution units?n integer unit (iu), a ?ating-point unit (fpu) (not supported on the EC603E microprocessor), a branch processing unit (bpu), a load/store unit (lsu), and a system register unit (sru). the ability to execute ?e instructions in parallel and the use of simple instructions with rapid execution times yield high ef?iency and throughput for 603e-based systems. most integer instructions execute in one clock cycle. on the 603e, the fpu is pipelined so a single-precision multiply-add instruction can be issued and completed every clock cycle. (note that the EC603E microprocessor does not support the ?ating-point unit.) the 603e provides independent on-chip, 16-kbyte, four-way set-associative, physically addressed caches for instructions and data and on-chip instruction and data memory management units (mmus). the mmus contain 64-entry, two-way set-associative, data and instruction translation lookaside buffers (dtlb and itlb) that provide support for
1-2 mpc603e & EC603E risc microprocessors user's manual motorola demand-paged virtual memory address translation and variable-sized block translation. the tlbs and caches use a least recently used (lru) replacement algorithm. the 603e also supports block address translation through the use of two independent instruction and data block address translation (ibat and dbat) arrays of four entries each. effective addresses are compared simultaneously with all four entries in the bat array during block translation. in accordance with the powerpc architecture, if an effective address hits in both the tlb and bat array, the bat translation takes priority. the 603e has a selectable 32- or 64-bit data bus and a 32-bit address bus. the 603e interface protocol allows multiple masters to compete for system resources through a central external arbiter. the 603e provides a three-state coherency protocol that supports the exclusive, modi?d, and invalid cache states. this protocol is a compatible subset of the mesi (modi?d/exclusive/shared/invalid) four-state protocol and operates coherently in systems that contain four-state caches. the 603e supports single-beat and burst data transfers for memory accesses, and supports memory-mapped i/o operations. the 603e is fabricated using an advanced cmos process technology and is fully compatible with ttl devices. 1.1.1 features this section describes the major features of the 603e noting where the pid6-603e, pid7v-603e, and EC603E implementations differ: high-performance, superscalar microprocessor as many as three instructions issued and retired per clock as many as ?e instructions in execution per clock single-cycle execution for most instructions pipelined fpu for all single-precision and most double-precision operations (the EC603E microprocessor does not support the ?ating-point unit.) five independent execution units and two register ?es bpu featuring static branch prediction a 32-bit iu fully ieee 754-compliant fpu for both single- and double-precision operations (the EC603E microprocessor does not support the ?ating-point unit.) lsu for data transfer between data cache and gprs and fprs (the EC603E microprocessor does not support the ?ating-point unit.) sru that executes condition register (cr), special-purpose register (spr), and integer add/compare instructions thirty-two gprs for integer operands
motorola chapter 1. overview 1-3 thirty-two fprs for single- or double-precision operands (the EC603E microprocessor does not support the ?ating-point unit.) high instruction and data throughput zero-cycle branch capability (branch folding) programmable static branch prediction on unresolved conditional branches instruction fetch unit capable of fetching two instructions per clock from the instruction cache a six-entry instruction queue that provides lookahead capability independent pipelines with feed-forwarding that reduces data dependencies in hardware 16-kbyte data cache?our-way set-associative, physically addressed; lru replacement algorithm 16-kbyte instruction cache?our-way set-associative, physically addressed; lru replacement algorithm cache write-back or write-through operation programmable on a per page or per block basis bpu that performs cr lookahead operations address translation facilities for 4-kbyte page size, variable block size, and 256-mbyte segment size a 64-entry, two-way set-associative itlb a 64-entry, two-way set-associative dtlb four-entry data and instruction bat arrays providing 128-kbyte to 256-mbyte blocks software table search operations and updates supported through fast trap mechanism 52-bit virtual address; 32-bit physical address facilities for enhanced system performance a 32- or 64-bit split-transaction external data bus with burst transfers support for one-level address pipelining and out-of-order bus transactions hardware support for misaligned little-endian accesses (pid7v-603e)
1-4 mpc603e & EC603E risc microprocessors user's manual motorola integrated power management low-power 2.5-volt and 3.3-volt designs internal processor/bus clock multiplier ratios as follows: 1/1, 1.5/1, 2/1, 2.5/1, 3/1, 3.5/1, and 4/1 (pid6-603e) 2/1, 2.5/1, 3/1, 3.5/1, 4/1, 4.5/1, 5/1, 5.5/1, and 6/1 (pid7v-603e) three power-saving modes: doze, nap, and sleep automatic dynamic power reduction when internal functional units are idle in-system testability and debugging features through jtag boundary-scan capability features speci? to the pid7v-603e follow: enhancements to the register set the pid7v-603e adds two new bits to the hid0 register: the address bus enable (abe) bit, bit 28, gives the pid7v-603e microprocessor the ability to broadcast dcbf , dcbi , and dcbst onto the 60x bus. the instruction fetch enable m (ifem) bit, bit 24, allows the pid7v-603e to re?ct the value of the m-bit onto the 60x bus during instruction translation. the run_n counter register (run_n) has been extended from 16 to 32 bits. enhancements to cache implementation the instruction cache is blocked only until the critical load completes (hit under reloads allowed). the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. provides for an optional data cache operation broadcast feature (enabled by the hid0[abe] bit) that allows for correct system management utilizing an external copyback l2 cache. all of the cache control instructions ( icbi , dcbi , dcbf , and dcbst , excluding dcbz ) require that the hid0[abe] con?uration bit be enabled in order to execute. exceptions the pid7v-603e now offers hardware support for misaligned little-endian accesses. little-endian load/store accesses that are not on a word boundary, with the exception of strings and multiples, generate exceptions under the same circumstances as big-endian accesses. the pid7v-603e removed misalignment support for eciwx and ecowx graphics instructions.these instructions cause an alignment exception if the access is not on a word boundary.
motorola chapter 1. overview 1-5 bus clock?ew bus multipliers of 4.5x, 5x, 5.5x, and 6x that are selected by the unused encodings of the pll_cfg[0?]. bus multipliers of 1x and 1.5x are not supported by pid7v-603e. power management?nternal voltage supply changed from 3.3 volts to 2.5 volts. the core logic of the chip now uses a 2.5-volt supply. signals?he run_n counter, which affects the jtag/cop, has been extended from 16 bits to 32 bits. instruction timing the integer divide instructions divwu [ o ][ . ] and divw [ o ][ . ] execute in 20 clock cycles; execution of these instructions in the pid6-603e requires 37 clock cycles. support for single-cycle store an adder/comparator added to system register unit that allows dispatch and execution of multiple integer add and compare instructions on each cycle. figure 1-1 provides a block diagram of the 603e that illustrates how the execution units?u, fpu (not supported by the EC603E microprocessor), bpu, lsu, and sru?perate independently and in parallel. note that this is a conceptual diagram and does not attempt to show how these features are physically implemented on the chip. for more information on the execution units, refer to powerpc 603e risc microprocessor technical summary . the 603e provides address translation and protection facilities, including an itlb, dtlb, and instruction and data bat arrays. instruction fetching and issuing is handled in the instruction unit. translation of addresses for cache or external memory accesses are handled by the mmus. both units are discussed in more detail in sections 1.1.3, ?nstruction unit,?and 1.1.5.1, ?emory management units (mmus).
1-6 mpc603e & EC603E risc microprocessors user's manual motorola figure 1-1. block diagram branch processing unit 32-/64-bit data bus 32-bit address bus instruction unit integer unit floating- point unit fpr file fp rename registers 16-kbyte d cache tags sequential fetcher ctr cr lr + * / fpscr system register unit + * / processor bus interface d mmu srs dtlb dbat array touch load buffer copyback buffer 64 bit 32 bit dispatch unit 64 bit 64 bit power dissipation control completion unit time base counter/ decrementer clock multiplier jtag/cop interface xer i mmu srs itlb ibat array 16-kbyte i cache tags 64 bit 64 bit 64 bit 64 bit 64 bit gpr file load/store unit + 64 bit gp rename registers instruction queue + * * note that the EC603E microprocessor does not support the ?ating-point unit or the ?ating-point register ?e.
motorola chapter 1. overview 1-7 1.1.2 system design and programming considerations the 603e is built upon the low power dissipation, low cost and high performance attributes of the 603 while providing the system designer additional capabilities through higher processor clock speeds (to 100 mhz), increases in cache size (16-kbyte instruction and data caches) and set associativity (four-way), and greater system clock ?xibility. the following subsections describe the differences between the 603 and the 603e that affect the system designer and programmer already familiar with the operation of the 603. the design enhancements to the 603e are described in the following sections as changes that can require a modi?ation to the hardware or software con?uration of a system designed for the 603. 1.1.2.1 hardware features the following hardware features of the 603e may require system designers to modify systems designed for the 603. 1.1.2.1.1 replacement of xa ts signal by cse1 signal the 603e employs four-way set associativity for both the instruction and data caches, in place of the two-way set associativity used in the 603. this change requires the use of an additional cache set entry (cse1) signal to indicate which member of the cache set is being loaded during a cache line ?l. the cse1 signal on the 603e is in the same pin location as the xa ts signal on the 603. note that the xa ts signal is no longer needed by the 603e because support for access to direct-store segments has been removed. table 1-1 shows the cse[0?] signal encoding indicating the cache set element selected during a cache load operation. 1.1.2.1.2 addition of half-clock bus multipliers some of the reserved clock con?uration signal settings of the 603 are rede?ed to allow more ?xible selection of higher internal and bus clock frequencies. the 603e provides programmable internal processor clock rates of 1x, 1.5x, 2x, 2.5x, 3x, 3.5x, and 4x multiples of the externally supplied clock frequency. for additional information, refer to the appropriate device-speci? hardware speci?ations. table 1-1. cse[0?] signals cse[0?] cache set element 00 set 0 01 set 1 10 set 2 11 set 3
1-8 mpc603e & EC603E risc microprocessors user's manual motorola 1.1.2.2 software features the features of the 603e described in the following sections affect software originally written for the 603. 1.1.2.2.1 16-kbyte instruction and data caches the instruction and data caches of the 603e are 16 kbytes in size, compared to the 8-kbyte instruction and data caches of the 603. the increase in cache size may require modi?ation of cache ?sh routines. the increase in cache size is also re?cted in four-way set associativity of the instruction and data caches in place of the two-way set associativity in the 603. 1.1.2.2.2 clock con?uration available in hid1 register bits 0? in the new hid1 register (spr 1009) provides software read-only access to the con?uration of the pll_cfg signals. the hid1 register is not implemented in the 603. 1.1.2.2.3 performance enhancements the following enhancements provide improved performance without any required changes to software (other than compiler optimization) or hardware designed for the 603: support for single-cycle store. addition of adder/comparator in system register unit allows dispatch and execution of multiple integer add and compare instructions on each cycle. addition of a key bit (bit 12) to srr1 to provide information about memory protection violations prior to page table search operations. this key bit is set when the combination of the settings in the appropriate kx bit in the segment register and the msr[pr] bit indicates that when the pp bits in the pte are set to either 00 or 01, a protection violation exists; if this is the case for a data write operation with a dtlb miss, the changed (c) bit in the page tables should not be updated (see table 1-2). this reduces the time required to execute the page table search routine since the software no longer has to explicitly read both the kx and msr[pr] bits to determine whether a protection violation exists before updating the c bit. table 1-2. generated srr1 [key] bit segment register [ks, kp] msr[pr] srr1[key] generated on dtlb misses 0x 0 0 x0 1 0 1x 0 1 x1 1 1 note that this key bit indicates a protection violation if the pte[pp] bits are either 00 or 01.
motorola chapter 1. overview 1-9 1.1.3 instruction unit as shown in figure 1-1, the 603e instruction unit, which contains a fetch unit, instruction queue, dispatch unit, and bpu, provides centralized control of instruction ?w to the execution units. the instruction unit determines the address of the next instruction to be fetched based on information from the sequential fetcher and from the bpu. the instruction unit fetches the instructions from the instruction cache into the instruction queue. the bpu extracts branch instructions from the fetcher and uses static branch prediction on unresolved conditional branches to allow the instruction unit to fetch instructions from a predicted target instruction stream while a conditional branch is evaluated. the bpu folds out branch instructions for unconditional branches or conditional branches unaffected by instructions in progress in the execution pipeline. instructions issued beyond a predicted branch do not complete execution until the branch is resolved, preserving the programming model of sequential execution. if any of these instructions are to be executed in the bpu, they are decoded but not issued. instructions to be executed by the fpu, iu, lsu, and sru are issued and allowed to complete up to the register write-back stage. (note that the fpu is not supported on the EC603E microprocessor.) write-back is allowed when a correctly predicted branch is resolved, and instruction execution continues without interruption along the predicted path. if branch prediction is incorrect, the instruction unit ?shes all predicted path instructions, and instructions are issued from the correct path. 1.1.3.1 instruction queue and dispatch unit the instruction queue (iq), shown in figure 1-1, holds as many as six instructions and loads up to two instructions from the instruction unit during a single cycle. the instruction fetch unit continuously loads as many instructions as space in the iq allows. instructions are dispatched to their respective execution units from the dispatch unit at a maximum rate of two instructions per cycle. dispatching is facilitated to the iu, fpu (not supported on the EC603E microprocessor), lsu, and sru by the provision of a reservation station at each unit. the dispatch unit performs source and destination register dependency checking, determines dispatch serializations, and inhibits subsequent instruction dispatching as required. for a more detailed overview of instruction dispatch, see section 1.3.6, ?nstruction timing. 1.1.3.2 branch processing unit (bpu) the bpu receives branch instructions from the fetch unit and performs cr lookahead operations on conditional branches to resolve them early, achieving the effect of a zero-cycle branch in many cases. the bpu uses a bit in the instruction encoding to predict the direction of the conditional branch. therefore, when an unresolved conditional branch instruction is encountered, the
1-10 mpc603e & EC603E risc microprocessors user's manual motorola 603e fetches instructions from the predicted target stream until the conditional branch is resolved. the bpu contains an adder to compute branch target addresses and three user-control registers?he link register (lr), the count register (ctr), and the cr. the bpu calculates the return pointer for subroutine calls and saves it into the lr for certain types of branch instructions. the lr also contains the branch target address for the branch conditional to link register ( bclr x ) instruction. the ctr contains the branch target address for the branch conditional to count register ( bcctr x ) instruction. the contents of the lr and ctr can be copied to or from any gpr. because the bpu uses dedicated registers rather than gprs or fprs, execution of branch instructions is largely independent from execution of integer and ?ating-point instructions. 1.1.4 independent execution units the powerpc architectures support for independent execution units allows implementation of processors with out-of-order instruction execution. for example, because branch instructions do not depend on gprs or fprs, branches can often be resolved early, eliminating stalls caused by taken branches. in addition to the bpu, the 603e provides four other execution units and a completion unit, which are described in the following sections. 1.1.4.1 integer unit (iu) the iu executes all integer instructions. the iu executes one integer instruction at a time, performing computations with its arithmetic logic unit (alu), multiplier, divider, and xer register. most integer instructions are single-cycle instructions. thirty-two general-purpose registers are provided to support integer operations. stalls due to contention for gprs are minimized by the automatic allocation of rename registers. the 603e writes the contents of the rename registers to the appropriate gpr when integer instructions are retired by the completion unit. 1.1.4.2 floating-point unit (fpu) the fpu (not supported by the EC603E microprocessor) contains a single-precision multiply-add array and the ?ating-point status and control register (fpscr). the multiply-add array allows the 603e to ef?iently implement multiply and multiply-add operations. the fpu is pipelined so that single-precision instructions and double-precision instructions can be issued back-to-back. thirty-two ?ating-point registers are provided to support ?ating-point operations. stalls due to contention for fprs are minimized by the automatic allocation of rename registers. the 603e writes the contents of the rename registers to the appropriate fpr when ?ating-point instructions are retired by the completion unit. the 603e supports all ieee 754 ?ating-point data types (normalized, denormalized, nan, zero, and in?ity) in hardware, eliminating the latency incurred by software exception
motorola chapter 1. overview 1-11 routines. (the term, ?xception is also referred to as ?nterrupt in the architecture speci?ation.) 1.1.4.3 load/store unit (lsu) the lsu executes all load and store instructions and provides the data transfer interface between the gprs, fprs, and the cache/memory subsystem. the lsu calculates effective addresses, performs data alignment, and provides sequencing for load/store string and multiple instructions. (note that the EC603E microprocessor does not support the ?ating-point register ?e.) load and store instructions are issued and translated in program order; however, the actual memory accesses can occur out of order. synchronizing instructions are provided to enforce strict ordering. cacheable loads, when free of data dependencies, execute in an out-of-order manner with a maximum throughput of one per cycle and a two-cycle total latency. data returned from the cache is held in a rename register until the completion logic commits the value to a gpr or fpr (not supported by the EC603E microprocessor). stores cannot be executed in a predicted manner and are held in the store queue until the completion logic signals that the store operation is to be completed to memory. the 603e executes store instructions with a maximum throughput of one per cycle and a three-cycle total latency. the time required to perform the actual load or store operation varies depending on whether the operation involves the cache, system memory, or an i/o device. 1.1.4.4 system register unit (sru) the sru executes various system-level instructions, including condition register logical operations and move to/from special-purpose register instructions, and also executes integer add/compare instructions. in order to maintain system state, most instructions executed by the sru are completion-serialized; that is, the instruction is held for execution in the sru until all prior instructions issued have completed. results from completion-serialized instructions executed by the sru are not available or forwarded for subsequent instructions until the instruction completes. 1.1.4.5 completion unit the completion unit tracks instructions from dispatch through execution, and then retires, or ?ompletes?them in program order. completing an instruction commits the 603e to any architectural register changes caused by that instruction. in-order completion ensures the correct architectural state when the 603e must recover from a mispredicted branch or any exception. instruction state and other information required for completion is kept in a ?st-in-?st-out (fifo) queue of ?e completion buffers. a single completion buffer is allocated for each instruction once it enters the dispatch unit. an available completion buffer is a required resource for instruction dispatch; if no completion buffers are available, instruction
1-12 mpc603e & EC603E risc microprocessors user's manual motorola dispatch stalls. a maximum of two instructions per cycle are completed in order from the queue. 1.1.5 memory subsystem support the 603e provides support for cache and memory management through dual instruction and data memory management units. the 603e also provides dual 16-kbyte instruction and data caches, and an ef?ient processor bus interface to facilitate access to main memory and other bus subsystems. the memory subsystem support functions are described in the following subsections. 1.1.5.1 memory management units (mmus) the 603es mmus support up to 4 petabytes (2 52 ) of virtual memory and 4 gigabytes (2 32 ) of physical memory (referred to as real memory in the architecture speci?ation) for instruction and data. the mmus also control access privileges for these spaces on block and page granularities. referenced and changed status is maintained by the processor for each page to assist implementation of a demand-paged virtual memory system. a key bit is implemented to provide information about memory protection violations prior to page table search operations. the lsu calculates effective addresses for data loads and stores, performs data alignment to and from cache memory, and provides the sequencing for load and store string and multiple word instructions. the instruction unit calculates the effective addresses for instruction fetching. after an address is generated, the higher-order bits of the effective address are translated by the appropriate mmu into physical address bits. simultaneously, the lower-order address bits (that are untranslated and therefore, considered both logical and physical), are directed to the on-chip caches where they form the index into the four-way set-associative tag array. after translating the address, the mmu passes the higher-order bits of the physical address to the cache, and the cache lookup completes. for caching-inhibited accesses or accesses that miss in the cache, the untranslated lower-order address bits are concatenated with the translated higher-order address bits; the resulting 32-bit physical address is then used by the memory unit and the system interface, which accesses external memory. the mmu also directs the address translation and enforces the protection hierarchy programmed by the operating system in relation to the supervisor/user privilege level of the access and in relation to whether the access is a load or store. for instruction accesses, the mmu performs an address lookup in both the 64 entries of the itlb, and in the ibat array. if an effective address hits in both the itlb and the ibat array, the ibat array translation takes priority. data accesses cause a lookup in the dtlb and dbat array for the physical address translation. in most cases, the physical address translation resides in one of the tlbs and the physical address bits are readily available to the on-chip cache.
motorola chapter 1. overview 1-13 when the physical address translation misses in the tlbs, the 603e provides hardware assistance for software to perform a search of the translation tables in memory. the hardware assist consists of the following features: automatic storage of the missed effective address in the imiss and dmiss registers automatic generation of the primary and secondary hashed real address of the page table entry group (pteg), which are readable from the hash1 and hash2 register locations. the hash data is generated from the contents of the imiss or dmiss register. which register is selected depends on which miss (instruction or data) was last acknowledged. automatic generation of the ?st word of the page table entry (pte) for which the tables are being searched a real page address (rpa) register that matches the format of the lower word of the pte two tlb access instructions ( tlbli and tlbld ) that are used to load an address translation into the instruction or data tlbs shadow registers for gprs 0? that allow miss code to execute without corrupting the state of any of the existing gprs. these shadow registers are only used for servicing a tlb miss. see section 1.3.5.2, ?mplementation-speci? memory management,?for more information about memory management for the 603e. 1.1.5.2 cache units the 603e provides independent 16-kbyte, four-way set-associative instruction and data caches. the cache line size is 32 bytes in length. the caches are designed to adhere to a write-back policy, but the 603e allows control of cacheability, write policy, and memory coherency at the page and block levels. the caches use a least recently used (lru) replacement policy. as shown in figure 1-1, the caches provide a 64-bit interface to the instruction fetch unit and load/store unit. the surrounding logic selects, organizes, and forwards the requested information to the requesting unit. write operations to the cache can be performed on a byte basis, and a complete read-modify-write operation to the cache can occur in each cycle. the load/store and instruction fetch units provide the caches with the address of the data or instruction to be fetched. in the case of a cache hit, the cache returns two words to the requesting unit. since the 603e data cache tags are single ported, simultaneous load or store and snoop accesses cause resource contention. snoop accesses have the highest priority and are given ?st access to the tags, unless the snoop access coincides with a tag write, in which case the
1-14 mpc603e & EC603E risc microprocessors user's manual motorola snoop is retried and must re-arbitrate for access to the cache. loads or stores that are deferred due to snoop accesses are executed on the clock cycle following the snoop. 1.1.6 processor bus interface because the caches on the 603e are on-chip, write-back caches, the predominant type of transaction for most applications is burst-read memory operations, followed by burst-write memory operations, and single-beat (noncacheable or write-through) memory read and write operations. additionally, there can be address-only operations, variants of the burst and single-beat operations, (for example, global memory operations that are snooped and atomic memory operations), and address retry activity (for example, when a snooped read access hits a modi?d line in the cache). memory accesses can occur in single-beat (1? bytes) and four-beat burst (32 bytes) data transfers when the bus is con?ured as 64 bits, and in single-beat (1? bytes), two-beat (8 bytes), and eight-beat (32 bytes) data transfers when the bus is con?ured as 32 bits. the address and data buses operate independently to support pipelining and split transactions during memory accesses. the 603e can pipeline its own transactions to a depth of one level. access to the system interface is granted through an external arbitration mechanism that allows devices to compete for bus mastership. this arbitration mechanism is ?xible, allowing the 603e to be integrated into systems that implement various fairness and bus parking procedures to avoid arbitration overhead. typically, memory accesses are weakly ordered?equences of operations, including load/store string and multiple instructions, do not necessarily complete in the order they begin?aximizing the ef?iency of the bus without sacri?ing coherency of the data. the 603e allows read operations to precede store operations (except when a dependency exists, or in cases where a non-cacheable access is performed), and provides support for a write operation to proceed a previously queued read data tenure (for example, allowing a snoop push to be enveloped by the address and data tenures of a read operation). because the processor can dynamically optimize run-time ordering of load/store traf?, overall performance is improved. 1.1.7 system support functions the 603e implements several support functions that include power management, time base/decrementer registers for system timing tasks, an ieee 1149.1(jtag)/common on-chip processor (cop) test interface, and a phase-locked loop (pll) clock multiplier. these system support functions are described in the following subsections.
motorola chapter 1. overview 1-15 1.1.7.1 power management the 603e provides four power modes selectable by setting the appropriate control bits in the machine state register (msr) and hardware implementation register 0 (hid0) registers. the four power modes are as follows: full-power?his is the default power state of the 603e. the 603e is fully powered and the internal functional units are operating at the full processor clock speed. if the dynamic power management mode is enabled, functional units that are idle will automatically enter a low-power state without affecting performance, software execution, or external hardware. doze?ll the functional units of the 603e are disabled except for the time base/decrementer registers and the bus snooping logic. when the processor is in doze mode, an external asynchronous interrupt, a system management interrupt, a decrementer exception, a hard or soft reset, or machine check brings the 603e into the full-power state. the 603e in doze mode maintains the pll in a fully powered state and locked to the system external clock input (sysclk) so a transition to the full-power state takes only a few processor clock cycles. nap?he nap mode further reduces power consumption by disabling bus snooping, leaving only the time base register and the pll in a powered state. the 603e returns to the full-power state upon receipt of an external asynchronous interrupt, a system management interrupt, a decrementer exception, a hard or soft reset, or a machine check input (mcp ) signal. a return to full-power state from a nap state takes only a few processor clock cycles. sleep?leep mode reduces power consumption to a minimum by disabling all internal functional units, after which external system logic may disable the pll and sysclk. returning the 603e to the full-power state requires the enabling of the pll and sysclk, followed by the assertion of an external asynchronous interrupt, a system management interrupt, a hard or soft reset, or a machine check input (mcp ) signal after the time required to relock the pll. the pid7v-603e implementation offers the following enhancements to the 603e family: lower-power design 2.5-volt core and 3.3-volt i/o 1.1.7.2 time base/decrementer the time base is a 64-bit register (accessed as two 32-bit registers) that is incremented once every four bus clock cycles; external control of the time base is provided through the time base enable (tben) signal. the decrementer is a 32-bit register that generates a decrementer interrupt exception after a programmable delay. the contents of the decrementer register are decremented once every four bus clock cycles, and the decrementer exception is generated as the count passes through zero.
1-16 mpc603e & EC603E risc microprocessors user's manual motorola 1.1.7.3 ieee 1149.1 (jtag)/cop test interface the 603e provides ieee 1149.1 and cop functions for facilitating board testing and chip debug. the ieee 1149.1 test interface provides a means for boundary-scan testing the 603e and the board to which it is attached. the cop function shares the ieee 1149.1 test port, provides a means for executing test routines, and facilitates chip and software debugging. 1.1.7.4 clock multiplier the internal clocking of the 603e is generated from and synchronized to the external clock signal, sysclk, by means of a voltage-controlled oscillator-based pll. the pll provides programmable internal processor clock rates of 1x, 1.5x, 2x, 2.5x, 3x, 3.5x, and 4x multiples of the externally supplied clock frequency. the bus clock is the same frequency and is synchronous with sysclk. the con?uration of the pll can be read by software from the hardware implementation register 1 (hid1). 1.2 powerpc architecture implementation the powerpc architecture consists of the following layers, and adherence to the powerpc architecture can be measured in terms of which of the following levels of the architecture is implemented: powerpc user instruction set architecture (uisa)?e?es the base user-level instruction set, user-level registers, data types, ?ating-point exception model, memory models for a uniprocessor environment, and programming model for a uniprocessor environment. powerpc virtual environment architecture (vea)?escribes the memory model for a multiprocessor environment, de?es cache control instructions, and describes other aspects of virtual environments. implementations that conform to the vea also adhere to the uisa, but may not necessarily adhere to the oea. powerpc operating environment architecture (oea)?e?es the memory management model, supervisor-level registers, synchronization requirements, and the exception model. implementations that conform to the oea also adhere to the uisa and the vea. the powerpc architecture allows a wide range of designs for such features as cache and system interface implementations. 1.3 implementation-speci? information the powerpc architecture is derived from the ibm power architecture (performance optimized with enhanced risc architecture). the powerpc architecture shares the bene?s of the power architecture optimized for single-chip implementations. the powerpc architecture design facilitates parallel instruction execution and is scalable to take advantage of future technological gains.
motorola chapter 1. overview 1-17 this section describes the powerpc architecture in general, and speci? details about the implementation of the 603e as a low-power, 32-bit member of the powerpc processor family. the main topics addressed are as follows: section 1.3.1, ?rogramming model,?describes the registers for the operating environment architecture common among powerpc processors and describes the programming model. it also describes the additional registers that are unique to the 603e. section 1.3.2, ?nstruction set and addressing modes,?describes the powerpc instruction set and addressing modes for the powerpc operating environment architecture, and de?es and describes the powerpc instructions implemented in the 603e. section 1.3.3, ?ache implementation,?describes the cache model that is de?ed generally for powerpc processors by the virtual environment architecture. it also provides speci? details about the 603e cache implementation. section 1.3.4, ?xception model,?describes the exception model of the powerpc operating environment architecture and the differences in the 603e exception model. section 1.3.5, ?emory management,?describes generally the conventions for memory management among the powerpc processors. this section also describes the 603es implementation of the 32-bit powerpc memory management speci?ation. section 1.3.6, ?nstruction timing,?provides a general description of the instruction timing provided by the superscalar, parallel execution supported by the powerpc architecture and the 603e. section 1.3.7, ?ystem interface,?describes the signals implemented on the 603e. the 603e is a high-performance, superscalar powerpc microprocessor. the powerpc architecture allows optimizing compilers to schedule instructions to maximize performance through ef?ient use of the powerpc instruction set and register model. the multiple, independent execution units allow compilers to optimize instruction throughput. compilers that take advantage of the ?xibility of the powerpc architecture can additionally optimize system performance of the powerpc processors. the following sections summarize the features of the 603e, including both those that are de?ed by the architecture and those that are unique to the various 603e implementations. speci? features of the 603e are listed in section 1.1.1, ?eatures. 1.3.1 programming model the powerpc architecture de?es register-to-register operations for most computational instructions. source operands for these instructions are accessed from the registers or are provided as immediate values embedded in the instruction opcode. the three-register instruction format allows speci?ation of a target register distinct from the two source operands. load and store instructions transfer data between registers and memory.
1-18 mpc603e & EC603E risc microprocessors user's manual motorola powerpc processors have two levels of privilege?upervisor mode of operation (typically used by the operating system) and user mode of operation (used by the application software). the programming models incorporate 32 gprs, 32 fprs (not supported by the EC603E microprocessor), special-purpose registers (sprs), and several miscellaneous registers. each powerpc microprocessor also has its own unique set of hardware implementation (hid) registers. having access to privileged instructions, registers, and other resources allows the operating system to control the application environment (providing virtual memory and protecting operating-system and critical machine resources). instructions that control the state of the processor, the address translation mechanism, and supervisor registers can be executed only when the processor is operating in supervisor mode. figure 1-2 shows all the 603e registers available at the user and supervisor level. the numbers to the right of the sprs indicate the number that is used in the syntax of the instruction operands to access the register. the following subsections describe the pid7v-603e implementation-speci? features as they apply to registers. 1.3.1.1 processor version register (pvr) the processor version number is 6 for the pid6-603e and 7 for the pid7v-603e. the processor revision level starts at 0x0100 and changes for each chip revision. the revision level is updated on all silicon revisions. 1.3.1.2 hardware implementation register 0 (hid0) pid7v-603e (designated by pvr level 0x0200) de?es additional bits in the hardware implementation register 0 (hid0), a supervisor-level register that provides the means for enabling the 603es checkstops and features, and allows software to read the con?uration of the pll con?uration signals. the hid0 bits with changed bit assignments are shown in table 1-3. the hid0 bits that are not shown here are implemented as they are in section 2.1.2.1, ?ardware implementation registers (hid0 and hid1). table 1-3. additional/changed hid0 bits bit(s) description 24 instruction fetch enable m (ifem) bit?nables the m bit on the bus. used for instruction fetches. 25?6 reserved 28 address broadcast enable (abe)?his con?uration bit allows for the broadcast of dcbf , dcbi , and dcbst on the bus. note that these cache control instruction broadcasts are not snooped by the pid7v-603e. refer to section 1.3.3, ?ache implementation, for more information. 29?0 reserved
motorola chapter 1. overview 1-19 1.3.1.3 run_n counter register (run_n) the 33-bit run_n counter register is unique to the pid7v-603e. the run_n counter is used by the cop to control the number of processor cycles that the processor runs before halting. the most-signi?ant 32 bits form a 32-bit counter. the function of the least-signi?ant bit remains unchanged. 1.3.1.4 general-purpose registers (gprs) the powerpc architecture de?es 32 user-level, general-purpose registers (gprs). these registers are either 32 bits wide in 32-bit powerpc microprocessors and 64 bits wide in 64-bit powerpc microprocessors. the gprs serve as the data source or destination for all integer instructions. 1.3.1.5 floating-point registers (fprs) the powerpc architecture also de?es 32 user-level, 64-bit ?ating-point registers (fprs) (not supported by the EC603E microprocessor). the fprs serve as the data source or destination for ?ating-point instructions. these registers can contain data objects of either single- or double-precision ?ating-point formats. 1.3.1.6 condition register (cr) the cr is a 32-bit user-level register that consists of eight four-bit ?lds that re?ct the results of certain operations, such as move, integer and ?ating-point compare, arithmetic, and logical instructions, and provide a mechanism for testing and branching. 1.3.1.7 floating-point status and control register (fpscr) the ?ating-point status and control register (fpscr) is a user-level register that contains all exception signal bits, exception summary bits, exception enable bits, and rounding control bits needed for compliance with the ieee 754 standard. (note that this is not supported by the EC603E microprocessor.) 1.3.1.8 machine state register (msr) the machine state register (msr) is a supervisor-level register that de?es the state of the processor. the contents of this register are saved when an exception is taken and restored when the exception handling completes. the 603e implements the msr as a 32-bit register; 64-bit powerpc processors implement a 64-bit msr. to ensure proper operation of the EC603E microprocessor, the msr[fp] bit should remain cleared to zero. 1.3.1.9 segment registers (srs) for memory management, 32-bit powerpc microprocessors implement sixteen 32-bit segment registers (srs). to speed access, the 603e implements the segment registers as two arrays; a main array (for data memory accesses) and a shadow array (for instruction memory accesses). loading a segment entry with the move to segment register ( mtsr ) instruction loads both arrays.
1-20 mpc603e & EC603E risc microprocessors user's manual motorola 1.3.1.10 special-purpose registers (sprs) the powerpc operating environment architecture de?es numerous special-purpose registers that serve a variety of functions, such as providing controls, indicating status, con?uring the processor, and performing special operations. during normal execution, a program can access the registers, shown in figure 2-1, depending on the programs access privilege (supervisor or user, determined by the privilege-level (pr) bit in the msr). note that registers such as the gprs and fprs (not supported by the EC603E microprocessor) are accessed through operands that are part of the instructions. access to registers can be explicit (that is, through the use of speci? instructions for that purpose such as move to special-purpose register ( mtspr ) and move from special-purpose register ( mfspr ) instructions) or implicit, as the part of the execution of an instruction. some registers are accessed both explicitly and implicitly in the 603e, all sprs are 32 bits wide. 1.3.1.10.1 user-level sprs the following 603e sprs are accessible by user-level software: link register (lr)?he link register can be used to provide the branch target address and to hold the return address after branch and link instructions. the lr is 32 bits wide in 32-bit implementations. count register (ctr)?he ctr is decremented and tested automatically as a result of branch-and-count instructions. the ctr is 32 bits wide in 32-bit implementations. xer register?he 32-bit xer contains the summary over?w bit, integer carry bit, over?w bit, and a ?ld specifying the number of bytes to be transferred by a load string word indexed ( lswx ) or store string word indexed ( stswx ) instruction. 1.3.1.10.2 supervisor-level sprs the 603e also contains sprs that can be accessed only by supervisor-level software. these registers consist of the following: the 32-bit dsisr de?es the cause of data access and alignment exceptions. the data address register (dar) is a 32-bit register that holds the address of an access after an alignment or dsi exception. decrementer register (dec) is a 32-bit decrementing counter that provides a mechanism for causing a decrementer exception after a programmable delay. the 32-bit sdr1 speci?s the page table format used in virtual-to-physical address translation for pages. (note that physical address is referred to as real address in the architecture speci?ation.) the machine status save/restore register 0 (srr0) is a 32-bit register that is used by the 603e for saving the address of the instruction that caused the exception, and the address to return to when a return from interrupt ( r ) instruction is executed.
motorola chapter 1. overview 1-21 the machine status save/restore register 1 (srr1) is a 32-bit register used to save machine status on exceptions and to restore machine status when an r instruction is executed. the 32-bit sprg0?prg3 registers are provided for operating system use. the external access register (ear) is a 32-bit register that controls access to the external control facility through the external control in word indexed ( eciwx ) and external control out word indexed ( ecowx ) instructions. the time base register (tb) is a 64-bit register that maintains the time of day and operates interval timers. the tb consists of two 32-bit ?lds?ime base upper (tbu) and time base lower (tbl). the processor version register (pvr) is a 32-bit, read-only register that identi?s the version (model) and revision level of the powerpc processor. block address translation (bat) arrays?he powerpc architecture de?es 16 bat registers, divided into four pairs of data bats (dbats) and four pairs of instruction bats (ibats). see figure 2-1 for a list of the spr numbers for the bat arrays. the following supervisor-level sprs are implementation-speci? to the 603e: the dmiss and imiss registers are read-only registers that are loaded automatically upon an instruction or data tlb miss. the hash1 and hash2 registers contain the physical addresses of the primary and secondary page table entry groups (ptegs). the icmp and dcmp registers contain a duplicate of the ?st word in the page table entry (pte) for which the table search is looking. the required physical address (rpa) register is loaded by the processor with the second word of the correct pte during a page table search. the hardware implementation (hid0 and hid1) registers provide the means for enabling the 603es checkstops and features, and allows software to read the con?uration of the pll con?uration signals. the instruction address breakpoint register (iabr) is loaded with an instruction address that is compared to instruction addresses in the dispatch queue. when an address match occurs, an instruction address breakpoint exception is generated. figure 2-1 shows all the 603e registers available at the user and supervisor level. the numbers to the right of the sprs indicate the number that is used in the syntax of the instruction operands to access the register.
1-22 mpc603e & EC603E risc microprocessors user's manual motorola figure 1-2. programming model?egisters dsisr spr 18 dsisr data address register spr 19 dar spr 26 srr0 spr 27 srr1 sprgs spr 272 sprg0 spr 273 sprg1 spr 274 sprg2 spr 275 sprg3 exception handling registers save and restore instruction bat registers spr 528 ibat0u spr 529 ibat0l spr 530 ibat1u spr 531 ibat1l spr 532 ibat2u spr 533 ibat2l spr 534 ibat3u spr 535 ibat3l data bat registers spr 536 dbat0u spr 537 dbat0l spr 538 dbat1u spr 539 dbat1l spr 540 dbat2u spr 541 dbat2l spr 542 dbat3u spr 543 dbat3l memory management registers software table search registers 1 spr 976 dmiss spr 977 dcmp spr 978 hash1 spr 979 hash2 spr 980 imiss spr 981 icmp spr 982 rpa machine state register msr processor version register spr 287 pvr configuration registers hardware implementation registers 1 spr 1008 hid0 tbr 268 tbl tbr 269 tbu spr 1 user model floating-point status and control register 2 fpscr condition register gpr0 gpr1 gpr31 general-purpose registers floating-point registers 2 xer xer spr 8 link register lr time base facility (for reading) supervisor model spr 22 decrementer dec time base facility (for writing) spr 284 tbl spr 285 tbu spr 282 external address register (optional) ear sdr1 spr 25 sdr1 spr 9 count register ctr miscellaneous registers spr 1010 iabr instruction address breakpoint register 1 segment registers sr0 sr1 sr15 fpr0 fpr1 fpr31 notes : 1 these registers are 603e?peci? (pid6-603e and pid7v-603e) registers. spr 1009 hid1 cr they may not be supported by other powerpc processors. 2 not supported on the EC603E microprocessor.
motorola chapter 1. overview 1-23 1.3.2 instruction set and addressing modes the following subsections describe the powerpc instruction set and addressing modes in general. 1.3.2.1 powerpc instruction set and addressing modes all powerpc instructions are encoded as single-word (32-bit) opcodes. instruction formats are consistent among all instruction types, permitting ef?ient decoding to occur in parallel with operand accesses. this ?ed instruction length and consistent format greatly simpli?s instruction pipelining. 1.3.2.1.1 powerpc instruction set the powerpc instructions are divided into the following categories: integer instructions?hese include computational and logical instructions. integer arithmetic instructions integer compare instructions integer logical instructions integer rotate and shift instructions floating-point instructions?hese include ?ating-point computational instructions, as well as instructions that affect the fpscr. (note that these instructions are not implemented on the EC603E microprocessor.) floating-point arithmetic instructions floating-point multiply/add instructions floating-point rounding and conversion instructions floating-point compare instructions floating-point status and control instructions load/store instructions?hese include integer and ?ating-point load and store instructions. integer load and store instructions integer load and store multiple instructions floating-point load and store (not implemented on the EC603E microprocessor) primitives used to construct atomic memory operations ( lwarx and stwcx. instructions) flow control instructions?hese include branching instructions, condition register logical instructions, trap instructions, and other instructions that affect the instruction ?w. branch and trap instructions condition register logical instructions
1-24 mpc603e & EC603E risc microprocessors user's manual motorola processor control instructions?hese instructions are used for synchronizing memory accesses and management of caches, tlbs, and the segment registers. move to/from spr instructions move to/from msr synchronize instruction synchronize memory control instructions?hese instructions provide control of caches, tlbs, and segment registers. supervisor-level cache management instructions user-level cache instructions segment register manipulation instructions translation lookaside buffer management instructions note that this grouping of the instructions does not indicate which execution unit executes a particular instruction or group of instructions. integer instructions operate on byte, half-word, and word operands. floating-point instructions operate on single-precision (one word) and double-precision (one double word) ?ating-point operands. the powerpc architecture uses instructions that are four bytes long and word-aligned. it provides for byte, half-word, and word operand loads and stores between memory and a set of 32 gprs. it also provides for word and double-word operand loads and stores between memory and a set of 32 ?ating-point registers (fprs). computational instructions do not modify memory. to use a memory operand in a computation and then modify the same or another memory location, the memory contents must be loaded into a register, modi?d, and then written back to the target location with distinct instructions. powerpc processors follow the program ?w when they are in the normal execution state. however, the ?w of instructions can be interrupted directly by the execution of an instruction or by an asynchronous event. either kind of exception may cause one of several components of the system software to be invoked. 1.3.2.1.2 calculating effective addresses the effective address (ea) is the 32-bit address computed by the processor when executing a memory access or branch instruction or when fetching the next sequential instruction. the powerpc architecture supports two simple memory addressing modes: ea = ( r a|0) + offset (including offset = 0) (register indirect with immediate index) ea = ( r a|0) + r b (register indirect with index) these simple addressing modes allow ef?ient address generation for memory accesses. calculation of the effective address for aligned transfers occurs in a single clock cycle.
motorola chapter 1. overview 1-25 for a memory access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address, the memory operand is considered to wrap around from the maximum effective address to effective address 0. effective address computations for both data and instruction accesses use 32-bit unsigned binary arithmetic. a carry from bit 0 is ignored in 32-bit implementations. 1.3.2.2 implementation-speci? instruction set the 603e instruction set is de?ed as follows: the 603e provides hardware support for all 32-bit powerpc instructions. the 603e provides two implementation-speci? instructions used for software table search operations following tlb misses: load data tlb entry ( tlbld ) load instruction tlb entry ( tlbli ) the 603e implements the following instructions which are de?ed as optional by the powerpc architecture: external control in word indexed ( eciwx ) external control out word indexed ( ecowx ) floating select ( fsel ) (not supported by the EC603E microprocessor) floating reciprocal estimate single-precision ( fres ) (not supported by the EC603E microprocessor) floating reciprocal square root estimate ( frsqrte ) (not supported by the EC603E microprocessor) store floating-point as integer word ( st?x ) (not supported by the EC603E microprocessor) 1.3.3 cache implementation the following subsections describe the general cache characteristics as implemented in the powerpc architecture, and the 603e implementation, speci?ally. pid7v-603e speci? information is noted where applicable. 1.3.3.1 powerpc cache characteristics the powerpc architecture does not de?e hardware aspects of cache implementations. for example, some powerpc processors, including the 603e, have separate instruction and data caches (harvard architecture), while others, such as the powerpc 601?microprocessor, implement a uni?d cache.
1-26 mpc603e & EC603E risc microprocessors user's manual motorola powerpc microprocessors control the following memory access modes on a page or block basis: write-back/write-through mode caching-inhibited mode memory coherency note that in the 603e, a cache block is de?ed as eight words. the vea de?es cache management instructions that provide a means by which the application programmer can affect the cache contents. 1.3.3.2 implementation-speci? cache implementation the 603e has two 16-kbyte, four-way set-associative (instruction and data) caches. the caches are physically addressed, and the data cache can operate in either write-back or write-through mode as speci?d by the powerpc architecture. the data cache is con?ured as 128 sets of four blocks each. each block consists of 32 bytes, two state bits, and an address tag. the two state bits implement the three-state mei (modi?d/exclusive/invalid) protocol. each block contains eight 32-bit words. note that the powerpc architecture de?es the term ?lock as the cacheable unit. for the 603e, the block size is equivalent to a cache line. a block diagram of the data cache organization is shown in figure 1-3. the instruction cache also consists of 128 sets of four blocks, and each block consists of 32 bytes, an address tag, and a valid bit. the instruction cache may not be written to except through a block ?l operation. in the pid7v-603e, the instruction cache is blocked only until the critical load completes. the pid7v-603e supports instruction fetching from other instruction cache lines following the forwarding of the critical ?st double word of a cache line load operation. successive instruction fetches from the cache line being loaded are forwarded, and accesses to other instruction cache lines can proceed during the cache line load operation. the instruction cache is not snooped, and cache coherency must be maintained by software. a fast hardware invalidation capability is provided to support cache maintenance. the organization of the instruction cache is very similar to the data cache shown in figure 1-3. each cache block contains eight contiguous words from memory that are loaded from an 8-word boundary (that is, bits a27?31 of the effective addresses are zero); thus, a cache block never crosses a page boundary. misaligned accesses across a page boundary can incur a performance penalty. the 603es cache blocks are loaded in four beats of 64 bits each when the 603e is con?ured with a 64-bit data bus; when the 603e is con?ured with a 32-bit bus, cache block loads are performed with eight beats of 32 bits each. the burst load is performed as critical double word ?st. the data cache is blocked to internal accesses until the load completes; the instruction cache allows sequential fetching during a cache block load. in
motorola chapter 1. overview 1-27 the pid7v-603e, the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. to ensure coherency among caches in a multiprocessor (or multiple caching-device) implementation, the 603e implements the mei protocol. these three states, modi?d, exclusive, and invalid, indicate the state of the cache block as follows: modi?d?he cache block is modi?d with respect to system memory; that is, data for this address is valid only in the cache and not in system memory. exclusive?his cache block holds valid data that is identical to the data at this address in system memory. no other cache has this data. invalid?his cache block does not hold valid data. cache coherency is enforced by on-chip bus snooping logic. since the 603es data cache tags are single-ported, a simultaneous load or store and snoop access represent a resource contention. the snoop access is given ?st access to the tags. the load or store then occurs on the clock following the snoop. figure 1-3. data cache organization 1.3.4 exception model this section describes the powerpc exception model and the 603e implementation, speci?ally. pid7v-603e?peci? information is noted where applicable. 1.3.4.1 powerpc exception model the powerpc exception mechanism allows the processor to change to supervisor state as a result of external signals, errors, or unusual conditions arising in the execution of instructions, and differ from the arithmetic exceptions de?ed by the ieee for ?ating-point operations. when exceptions occur, information about the state of the processor is saved to certain registers and the processor begins execution at an address address tag 1 address tag 2 address tag 3 block 1 block 2 block 3 128 sets address tag 0 block 0 8 words/block state state state words 0? words 0? words 0? words 0? state
1-28 mpc603e & EC603E risc microprocessors user's manual motorola (exception vector) predetermined for each exception. processing of exceptions occurs in supervisor mode. although multiple exception conditions can map to a single exception vector, a more speci? condition may be determined by examining a register associated with the exception?or example, the dsisr and the fpscr. additionally, some exception conditions can be explicitly enabled or disabled by software. the powerpc architecture requires that exceptions be handled in program order; therefore, although a particular implementation may recognize exception conditions out of order, they are presented strictly in order. when an instruction-caused exception is recognized, any unexecuted instructions that appear earlier in the instruction stream, including any that have not yet entered the execute stage, are required to complete before the exception is taken. any exceptions caused by those instructions are handled ?st. likewise, exceptions that are asynchronous and precise are recognized when they occur, but are not handled until the instruction currently in the completion stage successfully completes execution or generates an exception, and the completed store queue is emptied. unless a catastrophic condition causes a system reset or machine check exception, only one exception is handled at a time. if, for example, a single instruction encounters multiple exception conditions, those conditions are handled sequentially. after the exception handler handles an exception, the instruction execution continues until the next exception condition is encountered. however, in many cases there is no attempt to re-execute the instruction. this method of recognizing and handling exception conditions sequentially guarantees that exceptions are recoverable. exception handlers should save the information stored in srr0 and srr1 early to prevent the program state from being lost due to a system reset or machine check exception or to an instruction-caused exception in the exception handler, and before enabling external interrupts. the powerpc architecture supports four types of exceptions: synchronous, precise?hese are caused by instructions. all instruction-caused exceptions are handled precisely; that is, the machine state at the time the exception occurs is known and can be completely restored. this means that (excluding the trap and system call exceptions) the address of the faulting instruction is provided to the exception handler and that neither the faulting instruction nor subsequent instructions in the code stream will complete execution before the exception is taken. once the exception is processed, execution resumes at the address of the faulting instruction (or at an alternate address provided by the exception handler). when an exception is taken due to a trap or system call instruction, execution resumes at an address provided by the handler. synchronous, imprecise?he powerpc architecture de?es two imprecise ?ating-point exception modes, recoverable and nonrecoverable. even though the 603e provides a means to enable the imprecise modes, it implements these modes
motorola chapter 1. overview 1-29 identically to the precise mode (that is, all enabled ?ating-point enabled exceptions are always precise on the 603e). (note that the EC603E microprocessor does not support ?ating-point operations.) asynchronous, maskable?he external, system management interrupt (smi), and decrementer interrupts are maskable asynchronous exceptions. when these exceptions occur, their handling is postponed until the next instruction, and any exceptions associated with that instruction, completes execution. if there are no instructions in the execution units, the exception is taken immediately upon determination of the correct restart address (for loading srr0). asynchronous, nonmaskable?here are two nonmaskable asynchronous exceptions: system reset and the machine check exception. these exceptions may not be recoverable, or may provide a limited degree of recoverability. all exceptions report recoverability through the msr[ri] bit. 1.3.4.2 implementation-speci? exception model as speci?d by the powerpc architecture, all 603e exceptions can be described as either precise or imprecise and either synchronous or asynchronous. asynchronous exceptions (some of which are maskable) are caused by events external to the processors execution; synchronous exceptions, which are all handled precisely by the 603e, are caused by instructions. the 603e exception classes are shown in figure 1-4. although exceptions have other characteristics as well, such as whether they are maskable or nonmaskable, the distinctions shown in figure 1-4 de?e categories of exceptions that the 603e handles uniquely. note that figure 1-4 includes no synchronous imprecise instructions. while the powerpc architecture supports imprecise handling of ?ating-point exceptions, the 603e, with the exception of the EC603E microprocessor, implements ?ating-point exception modes as precise exceptions. the 603es exceptions, and conditions that cause them, are listed in figure 1-5. figure 1-4. exception classifications synchronous/asynchronous precise/imprecise exception type asynchronous, nonmaskable imprecise machine check system reset asynchronous, maskable precise external interrupt decrementer system management interrupt synchronous precise instruction-caused exceptions figure 1-5. exceptions and conditions exception type vector offset (hex) causing conditions reserved 00000
1-30 mpc603e & EC603E risc microprocessors user's manual motorola system reset 00100 a system reset is caused by the assertion of either sreset or hreset . machine check 00200 a machine check is caused by the assertion of the tea signal during a data bus transaction, assertion of mcp , or an address or data parity error. dsi 00300 the cause of a dsi exception can be determined by the bit settings in the dsisr, listed as follows: 1 set if the translation of an attempted access is not found in the primary hash table entry group (hteg), or in the rehashed secondary hteg, or in the range of a dbat register; otherwise cleared. 4 set if a memory access is not permitted by the page or dbat protection mechanism; otherwise cleared. 5 set by an eciwx or ecowx instruction if the access is to an address that is marked as write-through, or execution of a load/store instruction that accesses a direct-store segment. 6 set for a store operation and cleared for a load operation. 11 set if eciwx or ecowx is used and ear[e] is cleared. isi 00400 an isi exception is caused when an instruction fetch cannot be performed for any of the following reasons: the effective (logical) address cannot be translated. that is, there is a page fault for this portion of the translation, so an isi exception must be taken to load the pte (and possibly the page) into memory. the fetch access is to a direct-store segment (indicated by srr1[3] set). the fetch access violates memory protection (indicated by srr1[4] set). if the key bits (ks and kp) in the segment register and the pp bits in the pte are set to prohibit read access, instructions cannot be fetched from this location. external interrupt 00500 an external interrupt is caused when msr[ee] = 1 and the int signal is asserted. alignment 00600 an alignment exception is caused when the 603e cannot perform a memory access for any of the reasons described below: the operand of a ?ating-point load or store instruction is not word-aligned. the operand of lmw , stmw , lwarx , and stwcx. instructions are not aligned. the operand of a single-register load or store operation is not aligned, and the 603e is in little-endian mode (pid6-603e only). the execution of a ?ating-point load or store instruction to a direct-store segment. the operand of a load, store, load multiple, store multiple, load string, or store string instruction crosses a segment boundary into a direct-store segment, or crosses a protection boundary. execution of a misaligned eciwx or ecowx instruction (pid7v-603e only). the instruction is lmw , stmw , lswi , lswx , stswi , stswx and the 603e is in little- endian mode. the operand of dcbz is in memory that is write-through-required or caching- inhibited. figure 1-5. exceptions and conditions (continued) exception type vector offset (hex) causing conditions
motorola chapter 1. overview 1-31 program 00700 a program exception is caused by one of the following exception conditions, which correspond to bit settings in srr1 and arise during execution of an instruction: floating-point enabled exception? ?ating-point enabled exception condition is generated when the following condition is met: (msr[fe0] | msr[fe1]) & fpscr[fex] is 1. (not supported by the EC603E microprocessor.) fpscr[fex] is set by the execution of a ?ating-point instruction that causes an enabled exception or by the execution of one of the ?ove to fpscr instructions that results in both an exception condition bit and its corresponding enable bit being set in the fpscr. (not supported by the EC603E microprocessor.) illegal instruction?n illegal instruction program exception is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode ?lds (including powerpc instructions not implemented in the 603e), or when execution of an optional instruction not provided in the 603e is attempted (these do not include those optional instructions that are treated as no-ops). privileged instruction? privileged instruction type program exception is generated when the execution of a privileged instruction is attempted and the msr register user privilege bit, msr[pr], is set. in the 603e, this exception is generated for mtspr or mfspr with an invalid spr ?ld if spr[0] = 1 and msr[pr] = 1. this may not be true for all powerpc processors. trap? trap type program exception is generated when any of the conditions speci?d in a trap instruction is met. floating-point unavailable 00800 a ?ating-point unavailable exception is caused by an attempt to execute a ?ating-point instruction (including ?ating-point load, store, and move instructions) when the ?ating-point available bit is disabled (msr[fp] = 0). note that the EC603E microprocessor takes a ?ating-point unavailable exception when execution of a ?ating-point instruction is attempted. decrementer 00900 the decrementer exception occurs when the most signi?ant bit of the decrementer (dec) register transitions from 0 to 1. must also be enabled with the msr[ee] bit. reserved 00a00?0bff system call 00c00 a system call exception occurs when a system call ( sc ) instruction is executed. trace 00d00 a trace exception is taken when msr[se] =1 or when the currently completing instruction is a branch and msr[be] =1. reserved 00e00 the 603e does not generate an exception to this vector. other powerpc processors may use this vector for ?ating-point assist exceptions. reserved 00e10?0fff instruction translation miss 01000 an instruction translation miss exception is caused when an effective address for an instruction fetch cannot be translated by the itlb. data load translation miss 01100 a data load translation miss exception is caused when an effective address for a data load operation cannot be translated by the dtlb. figure 1-5. exceptions and conditions (continued) exception type vector offset (hex) causing conditions
1-32 mpc603e & EC603E risc microprocessors user's manual motorola 1.3.5 memory management the following subsections describe the memory management features of the powerpc architecture, and the 603e implementation, respectively. 1.3.5.1 powerpc memory management the primary functions of the mmu are to translate logical (effective) addresses to physical addresses for memory accesses, and to provide access protection on blocks and pages of memory. there are two types of accesses generated by the 603e that require address translation instruction accesses, and data accesses to memory generated by load and store instructions. the powerpc mmu and exception model support demand-paged virtual memory. virtual memory management permits execution of programs larger than the size of physical memory; demand-paged implies that individual pages are loaded into physical memory from system memory only when they are ?st accessed by an executing program. the hashed page table is a variable-sized data structure that de?es the mapping between virtual page numbers and physical page numbers. the page table size is a power of 2, and its starting address is a multiple of its size. the page table contains a number of page table entry groups (ptegs). a pteg contains eight page table entries (ptes) of eight bytes each; therefore, each pteg is 64 bytes long. pteg addresses are entry points for table search operations. address translations are enabled by setting bits in the msr?sr[ir] enables instruction address translations and msr[dr] enables data address translations. 1.3.5.2 implementation-speci? memory management the instruction and data memory management units in the 603e provide 4 gbytes of logical address space accessible to supervisor and user programs with a 4-kbyte page size and data store translation miss 01200 a data store translation miss exception is caused when an effective address for a data store operation cannot be translated by the dtlb, or where a dtlb hit occurs, and the change bit in the pte must be set due to a data store operation. instruction address breakpoint 01300 an instruction address breakpoint exception occurs when the address (bits 0?9) in the iabr matches the next instruction to complete in the completion unit, and the iabr enable bit (bit 30) is set. system management interrupt 01400 a system management interrupt is caused when msr[ee] = 1 and the smi input signal is asserted. reserved 01500?2fff figure 1-5. exceptions and conditions (continued) exception type vector offset (hex) causing conditions
motorola chapter 1. overview 1-33 256-mbyte segment size. block sizes range from 128 kbyte to 256 mbyte and are software selectable. in addition, the 603e uses an interim 52-bit virtual address and hashed page tables for generating 32-bit physical addresses. the mmus in the 603e rely on the exception processing mechanism for the implementation of the paged virtual memory environment and for enforcing protection of designated memory areas. instruction and data tlbs provide address translation in parallel with the on-chip cache access, incurring no additional time penalty in the event of a tlb hit. a tlb is a cache of the most recently used page table entries. software is responsible for maintaining the consistency of the tlb with memory. the 603es tlbs are 64-entry, two-way set-associative caches that contain instruction and data address translations. the 603e provides hardware assist for software table search operations through the hashed page table on tlb misses. supervisor software can invalidate tlb entries selectively. the 603e also provides independent four-entry bat arrays for instructions and data that maintain address translations for blocks of memory. these entries de?e blocks that can vary from 128 kbytes to 256 mbytes. the bat arrays are maintained by system software. as speci?d by the powerpc architecture, the hashed page table is a variable-sized data structure that de?es the mapping between virtual page numbers and physical page numbers. the page table size is a power of 2, and its starting address is a multiple of its size. also as speci?d by the powerpc architecture, the page table contains a number of page table entry groups (ptegs). a pteg contains eight page table entries (ptes) of eight bytes each; therefore, each pteg is 64 bytes long. pteg addresses are entry points for table search operations. 1.3.6 instruction timing the 603e is a pipelined superscalar processor. a pipelined processor is one in which the processing of an instruction is reduced into discrete stages. because the processing of an instruction is broken into a series of stages, an instruction does not require the entire resources of an execution unit. for example, after an instruction completes the decode stage, it can pass on to the next stage, while the subsequent instruction can advance into the decode stage. this improves the throughput of the instruction ?w. for example, it may take three cycles for a ?ating-point instruction to complete, but if there are no stalls in the ?ating-point pipeline, a series of ?ating-point instructions can have a throughput of one instruction per cycle.
1-34 mpc603e & EC603E risc microprocessors user's manual motorola the instruction pipeline in the 603e has four major pipeline stages, described as follows: the fetch pipeline stage primarily involves retrieving instructions from the memory system and determining the location of the next instruction fetch. additionally, the bpu decodes branches during the fetch stage and folds out branch instructions before the dispatch stage if possible. the dispatch pipeline stage is responsible for decoding the instructions supplied by the instruction fetch stage, and determining which of the instructions are eligible to be dispatched in the current cycle. in addition, the source operands of the instructions are read from the appropriate register ?e and dispatched with the instruction to the execute pipeline stage. at the end of the dispatch pipeline stage, the dispatched instructions and their operands are latched by the appropriate execution unit. during the execute pipeline stage each execution unit that has an executable instruction executes the selected instruction (perhaps over multiple cycles), writes the instruction's result into the appropriate rename register, and noti?s the completion stage that the instruction has ?ished execution. in the case of an internal exception, the execution unit reports the exception to the completion/writeback pipeline stage and discontinues instruction execution until the exception is handled. the exception is not signaled until that instruction is the next to be completed. execution of most ?ating-point instructions is pipelined within the fpu allowing up to three instructions to be executing in the fpu concurrently. the pipeline stages for the ?ating-point unit are multiply, add, and round-convert. execution of most load/store instructions is also pipelined. the load/store unit has two pipeline stages. the ?st stage is for effective address calculation and mmu translation and the second stage is for accessing the data in the cache. (note that the EC603E microprocessor does not support the ?ating-point unit.) the complete/writeback pipeline stage maintains the correct architectural machine state and transfers the contents of the rename registers to the gprs and fprs as instructions are retired. if the completion logic detects an instruction causing an exception, all following instructions are cancelled, their execution results in rename registers are discarded, and instructions are fetched from the correct instruction stream. a superscalar processor is one that issues multiple independent instructions into multiple pipelines allowing instructions to execute in parallel. the 603e has ?e independent execution units, one each for integer instructions, ?ating-point instructions (?ating-point instructions are trapped by the ?ating-point unavailable exception on the EC603E microprocessor), branch instructions, load/store instructions, and system register instructions. the iu and the fpu each have dedicated register ?es for maintaining operands (gprs and fprs, respectively), allowing integer calculations and ?ating-point calculations to occur simultaneously without interference. integer division performance of the pid7v-603e has been improved, with the divwu x and divw x instructions executing in 20 clock cycles, instead of the 37 cycles required in the pid6-603e.
motorola chapter 1. overview 1-35 the 603e provides support for single-cycle store and it provides an adder/comparator in the system register unit that allows the dispatch and execution of multiple integer add and compare instructions on each cycle. refer to chapter 6, ?nstruction timing,?for more information. because the powerpc architecture can be applied to such a wide variety of implementations, instruction timing among various powerpc processors varies accordingly. 1.3.7 system interface the system interface is speci? for each powerpc microprocessor implementation. the 603e provides a versatile system interface that allows for a wide range of implementations. the interface includes a 32-bit address bus, a 32- or 64-bit data bus, and 56 control and information signals (see figure 1-6). the system interface allows for address-only transactions as well as address and data transactions. the 603e control and information signals include the address arbitration, address start, address transfer, transfer attribute, address termination, data arbitration, data transfer, data termination, and processor state signals. test and control signals provide diagnostics for selected internal circuits. figure 1-6. system interface the system interface supports bus pipelining, which allows the address tenure of one transaction to overlap the data tenure of another. the extent of the pipelining depends on external arbitration and control circuitry. similarly, the 603e supports split-bus transactions for systems with multiple potential bus masters?ne device can have mastership of the address bus while another has mastership of the data bus. allowing multiple bus transactions to occur simultaneously increases the available bus bandwidth for other activity and as a result, improves performance. the 603e supports multiple masters through a bus arbitration scheme that allows various devices to compete for the shared bus resource. the arbitration logic can implement priority protocols, such as fairness, and can park masters to avoid arbitration overhead. the mei 603e address address arbitration address start address transfer transfer attribute address termination clocks data data arbitration data transfer data termination processor state test and control +3.3 v
1-36 mpc603e & EC603E risc microprocessors user's manual motorola protocol ensures coherency among multiple devices and system memory. also, the 603e's on-chip caches and tlbs and optional second-level caches can be controlled externally. the 603es clocking structure allows the bus to operate at integer multiples of the processor cycle time. the following sections describe the 603e bus support for memory operations. note that some signals perform different functions depending upon the addressing protocol used. 1.3.7.1 memory accesses the 603es data bus is con?ured at power-up to either a 32- or 64-bit width. when the 603e is con?ured with a 32-bit data bus, memory accesses allow transfer sizes of 8, 16, 24, or 32 bits in one bus clock cycle. data transfers occur in either single-beat transactions, or two-beat or eight-beat burst transactions, with a single-beat transaction transferring as many as 32 bits. single- or double-beat transactions are caused by noncached accesses that access memory directly (that is, reads and writes when caching is disabled, caching-inhibited accesses, and stores in write-through mode). eight-beat burst transactions, which always transfer an entire cache line (32 bytes), are initiated when a line is read from or written to memory. when the 603e is con?ured with a 64-bit data bus, memory accesses allow transfer sizes of 8, 16, 24, 32, 40, 48, 56, or 64 bits in one bus clock cycle. data transfers occur in either single-beat transactions or four-beat burst transactions. single-beat transactions are caused by noncached accesses that access memory directly (that is, reads and writes when caching is disabled, caching-inhibited accesses, and stores in write-through mode). four-beat burst transactions, which always transfer an entire cache line (32 bytes), are initiated when a line is read from or written to memory. 1.3.7.2 signals the 603e signals are grouped as follows: address arbitration signals?he 603e uses these signals to arbitrate for address bus mastership. address transfer start signals?hese signals indicate that a bus master has begun a transaction on the address bus. address transfer signals?hese signals, which consist of the address bus, address parity, and address parity error signals, are used to transfer the address and to ensure the integrity of the transfer. transfer attribute signals?hese signals provide information about the type of transfer, such as the transfer size and whether the transaction is bursted, write-through, or caching-inhibited. address transfer termination signals?hese signals are used to acknowledge the end of the address phase of the transaction. they also indicate whether a condition exists that requires the address phase to be repeated.
motorola chapter 1. overview 1-37 data arbitration signals?he 603e uses these signals to arbitrate for data bus mastership. data transfer signals?hese signals, which consist of the data bus, data parity, and data parity error signals, are used to transfer the data and to ensure the integrity of the transfer. data transfer termination signals?ata termination signals are required after each data beat in a data transfer. in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the ?al data beat. they also indicate whether a condition exists that requires the data phase to be repeated. system status signals?hese signals include the interrupt signal, checkstop signals, and both soft- and hard-reset signals. these signals are used to interrupt and, under various conditions, to reset the processor. processor state signals?hese signals indicate the state of the reservation coherency bit, enable the time base, provide machine quiesce control, and cause a machine halt on execution of a tlbsync instruction. ieee 1149.1(jtag)/cop interface signals?he ieee 1149.1 test unit and the common on-chip processor (cop) unit are accessed through a shared set of input, output, and clocking signals. the ieee 1149.1/cop interface provides a means for boundary scan testing and internal debugging of the 603e. test interface signals?hese signals are used for production testing. clock signals?hese signals determine the system clock frequency. these signals can also be used to synchronize multiprocessor systems. note a bar over a signal name indicates that the signal is active low?or example, ar tr y (address retry) and ts (transfer start). active-low signals are referred to as asserted (active) when they are low and negated when they are high. signals that are not active low, such as ap[0?] (address bus parity signals) and tt[0?] (transfer type signals) are referred to as asserted when they are high and negated when they are low.
1-38 mpc603e & EC603E risc microprocessors user's manual motorola 1.3.7.3 signal con?uration figure 1-7 illustrates the 603e's logical pin con?uration, showing how the signals are grouped. figure 1-7. signal groups br bg ab b ts a[0?1] ap[0?] ape tt[0?] tbst tsiz[0?] gbl ci wt cse[0?] tc[0?] aack artry sysclk clk_out pll_cfg[0?] 1 1 1 1 32 4 1 5 1 3 1 1 1 2 2 1 1 1 1 4 1 1 1 64 8 1 1 1 1 1 2 1 2 2 1 2 1 1 5 3 603e dbg dbwo dbb dh[0?1], dl[0?1] dp[0?] dpe dbdis ta drtry tea int , smi mcp ckstp _in , c kstp_out hreset , sreset rsrv qreq , qack tben tlbisync trst , tck, tms, tdi, td0 test address arbitration address start address bus transfer attribute address termination clocks data arbitration data transfer data termination interrupts checkstops reset processor status jtag/cop interface lssd test control +3.3 v
motorola chapter 2. programming model 2-1 chapter 2 programming model 20 20 this chapter describes the powerpc programming model with respect to the powerpc 603e microprocessor. it consists of three major sections that describe the following: registers implemented in the 603e operand conventions the 603e instruction set 2.1 register set this section describes the register organization in the 603e as de?ed by the three levels of the powerpc architecture?he user instruction set architecture (uisa), the virtual environment architecture (vea), and the operating environment architecture (oea), as well as the 603e implementation-speci? registers. full descriptions of the basic register set de?ed by the powerpc architecture are provided in chapter 2, ?owerpc register set, in the programming environments manual . the powerpc architecture de?es register-to-register operations for all computational instructions. source data for these instructions is accessed from the on-chip registers or is provided as an immediate value embedded in the opcode. the three-register instruction format allows speci?ation of a target register distinct from the two source registers, thus preserving the original data for use by other instructions and reducing the number of instructions required for certain operations. data is transferred between memory and registers with explicit load and store instructions only. note that there may be registers common to other powerpc processors that are not implemented in the 603e. when the 603e detects special-purpose register (spr) encodings other than those de?ed in this document, it either takes an exception or it treats the instruction as a no-op. (note that exceptions are referred to as interrupts in the architecture specification.) conversely, some sprs in the 603e may not be implemented in other powerpc processors, or may not be implemented in the same way in other powerpc processors. 2.1.1 powerpc register set the powerpc uisa registers, shown in figure 2-1, can be accessed by either user- or supervisor-level instructions (the architecture speci?ation refers to user- and supervisor-
2-2 mpc603e & EC603E risc microprocessors user's manual motorola level as problem state and privileged state, respectively). the general-purpose registers (gprs) and ?ating-point registers (fprs) are accessed through instruction operands. (note that the EC603E microprocessor does not support the ?ating-point register ?e; an attempt to access the ?ating-point register ?e will result in a ?ating-point unavailable exception.) access to registers can be explicit (that is, through the use of speci? instructions for that purpose such as the mtspr and mfspr instructions) or implicit as part of the execution (or side effect) of an instruction. some registers are accessed both explicitly and implicitly. the number to the right of the register name indicates the number that is used in the syntax of the instruction operands to access the register (for example, the number used to access the xer is spr1). for more information on the powerpc register set, refer to chapter 2, ?owerpc register set,?in the programming environments manual .
motorola chapter 2. programming model 2-3 figure 2-1. programming model?egisters dsisr spr 18 dsisr data address register spr 19 dar spr 26 srr0 spr 27 srr1 sprgs spr 272 sprg0 spr 273 sprg1 spr 274 sprg2 spr 275 sprg3 exception handling registers save and restore instruction bat registers spr 528 ibat0u spr 529 ibat0l spr 530 ibat1u spr 531 ibat1l spr 532 ibat2u spr 533 ibat2l spr 534 ibat3u spr 535 ibat3l data bat registers spr 536 dbat0u spr 537 dbat0l spr 538 dbat1u spr 539 dbat1l spr 540 dbat2u spr 541 dbat2l spr 542 dbat3u spr 543 dbat3l memory management registers software table search registers 1 spr 976 dmiss spr 977 dcmp spr 978 hash1 spr 979 hash2 spr 980 imiss spr 981 icmp spr 982 rpa machine state register msr processor version register spr 287 pvr configuration registers hardware implementation registers 1 spr 1008 hid0 tbr 268 tbl tbr 269 tbu spr 1 user model floating-point status and control register 2 fpscr condition register gpr0 gpr1 gpr31 general-purpose registers floating-point registers 2 xer xer spr 8 link register lr time base facility (for reading) supervisor model spr 22 decrementer dec time base facility (for writing) spr 284 tbl spr 285 tbu spr 282 external address register (optional) ear sdr1 spr 25 sdr1 spr 9 count register ctr miscellaneous registers spr 1010 iabr instruction address breakpoint register 1 segment registers sr0 sr1 sr15 fpr0 fpr1 fpr31 notes : 1 these registers are 603e?peci? (pid6-603e and pid7v-603e) registers. spr 1009 hid1 cr they may not be supported by other powerpc processors. 2 not supported on the EC603E microprocessor.
2-4 mpc603e & EC603E risc microprocessors user's manual motorola the 603es user-level registers are described as follows: user-level registers (uisa) ?he user-level registers can be accessed by all software with either user or supervisor privileges. the user-level register set includes the following: general-purpose registers (gprs). the general-purpose register ?e consists of thirty-two 32-bit gprs designated as gpr0?pr31. this register ?e serves as the data source or destination for all integer instructions and provides data for generating addresses. floating-point registers (fprs). the ?ating-point register ?e consists of thirty- two 64-bit fprs designated as fpr0?pr31, which serves as the data source or destination for all ?ating-point instructions. these registers can contain data objects of either single- or double-precision ?ating-point format. (the ?ating- point register ?e is not supported on the EC603E microprocessor; an attempt to access the ?ating-point register ?e will result in a ?ating-point unavailable exception.) condition register (cr). the cr is a 32-bit register, divided into eight 4-bit ?lds, cr0?r7, that re?cts the results of certain arithmetic operations and provides a mechanism for testing and branching. floating-point status and control register (fpscr). the fpscr is a user-control register that contains all ?ating-point exception signal bits, exception summary bits, exception enable bits, and rounding control bits needed for compliance with the ieee 754 standard. (the fpu is not supported on the EC603E microprocessor; an attempt to access the ?ating-point register ?e will result in a ?ating-point unavailable exception.) the remaining user-level registers are sprs. note that the powerpc architecture provides a separate mechanism for accessing sprs (the mtspr and mfspr instructions). these instructions are commonly used to explicitly access certain registers, while other sprs may be more typically accessed as the side effect of executing other instructions. xer register (xer). the xer is a 32-bit register that indicates over?w and carries for integer operations. it is set implicitly by many instructions. link register (lr). the 32-bit link register provides the branch target address for the branch conditional to link register ( bclr x ) instruction, and can optionally be used to hold the logical address (referred to as the effective address in the architecture speci?ation) of the instruction that follows a branch and link instruction, typically used for linking to subroutines. count register (ctr). the ctr is a 32-bit register for holding a loop count that can be decremented during execution of appropriately coded branch instructions. the ctr can also provide the branch target address for the branch conditional to count register ( bcctr x ) instruction.
motorola chapter 2. programming model 2-5 user-level registers (vea) ?he powerpc vea introduces the time base facility (tb) for reading. the tb is a 64-bit register pair whose contents are incremented once every four bus clock cycles. the tb consists of two 32-bit registers?ime base upper (tbu) and time base lower (tbl). note that the time base registers are read- only when in user state. the 603es supervisor-level registers are described as follows: supervisor-level registers (oea) ?he oea defines the registers that are used typically by an operating system for such operations as memory management, configuration, and exception handling. the supervisor-level registers defined by the powerpc architecture for 32-bit implementations are described as follows: con?uration registers machine state register (msr). the msr de?es the state of the processor. the msr can be modi?d by the move to machine state register ( mtmsr ), system call ( sc ), and return from exception ( r ) instructions. it can be read by the move from machine state register ( mfmsr ) instruction. implementation note ?he 603e de?es msr[13] as the power management enable (pow) bit and msr[14] as the temporary gpr remapping (tgpr) bit. these additional bits are described in table 2-1. table 2-1. msr[pow] and msr[tgpr] bits bit name description 13 pow power management enable (603e-speci?) 0 disables programmable power modes (normal operation mode). 1 enables programmable power modes (nap, doze, or sleep mode). this bit controls the programmable power modes only, it has no effect on dynamic power management (dpm). msr[pow] may be altered with an mtmsr instruction only. also, when altering the pow bit, software may alter only this bit in the msr and no others. the mtmsr instruction must be followed by a context-synchronizing instruction. see chapter 9, ?ower management, for more information on power management. 14 tgpr temporary gpr remapping (603e-speci?) 0 normal operation 1 tgpr mode. gpr0?pr3 are remapped to tgpr0?gpr3 for use by tlb miss routines. the contents of gpr0?pr3 remain unchanged while msr[tgpr] = 1. attempts to use gpr4?pr31 with msr[tgpr] = 1 yield unde?ed results. overlays tgpr0?gpr3 over gpr0?pr3 for use by tlb miss routines. when this bit is set, all instruction accesses to gpr0?pr3 are mapped to tgpr0?gpr3, respectively. the contents of gpr0?pr3 are unchanged as long as this bit remains set. attempts to use gpr4?pr31 when this bit is set yields unde?ed results.the tgpr bit is set when either an instruction tlb miss, data read miss, or data write miss exception is taken. the tgpr bit is cleared by an r instruction.
2-6 mpc603e & EC603E risc microprocessors user's manual motorola processor version register (pvr). this register is a read-only register that identi?s the version (model) and revision level of the powerpc processor. implementation note ?he processor version number is 6 for the pid6- 603e and 7 for the pid7v-603e. the processor revision level starts at 0x0100 and changes for each chip revision. the revision level is updated on all silicon revisions. memory management registers block-address translation (bat) registers. the 603e includes eight block- address translation registers (bats), consisting of four pairs of instruction bats (ibat0u?bat3u and ibat0l?bat3l) and four pairs of data bats (dbat0u?bat3u and dbat0l?bat3l). see figure 2-1 for a list of the spr numbers for the bat registers. sdr1. the sdr1 register speci?s the page table base address used in virtual- to-physical address translation. (note that physical address is referred to as real address in the architecture speci?ation.) segment registers (sr). the powerpc oea de?es sixteen 32-bit segment registers (sr0?r15). note that srs are implemented on 32-bit implementations only. the ?lds in the segment register are interpreted differently depending on the value of bit 0. exception handling registers data address register (dar). after a data access or an alignment exception, the dar is set to the effective address generated by the faulting instruction. sprg0?prg3. the sprg0?prg3 registers are provided for operating system use. dsisr. the dsisr de?es the cause of data access and alignment exceptions. machine status save/restore register 0 (srr0). the srr0 is used to save machine status on exceptions and to restore machine status when an r instruction is executed. machine status save/restore register 1 (srr1). the srr1 is used to save machine status on exceptions and to restore machine status when an r instruction is executed. implementation note ?he 603e implements the key bit (bit 12) in the srr1 register in order to simplify the table search software. for more information refer to chapter 5, ?emory management. miscellaneous registers the time base facility (tb) for writing. the tb is a 64-bit register pair that can be used to provide time of day or interval timing. it consists of two 32-bit registers?ime base upper (tbu) and time base lower (tbl). the tb is incremented once every four clock cycles.
motorola chapter 2. programming model 2-7 decrementer (dec). the dec register is a 32-bit decrementing counter that provides a mechanism for causing a decrementer exception after a programmable delay. the dec is decremented once every four bus clock cycles. external access register (ear). the ear is a 32-bit register used in conjunction with the eciwx and ecowx instructions. while the powerpc architecture speci?s that the low-order six bits of the ear (bits 26?1) are used to select a device, the 603e only implements the low-order 4 bits (bits 28?1). note that the ear register and the eciwx and ecowx instructions are optional in the powerpc architecture and may not be supported in all powerpc processors that implement the oea. 2.1.2 implementation-speci? registers the 603e includes several implementation-speci? sprs that are not de?ed by the powerpc architecture. they are the dmiss, imiss, dcmp, icmp, hash1, hash2, rpa, hid0, hid1, and iabr registers. these registers can be accessed by supervisor-level instructions only. any attempt to access these sprs with user-level instructions results in a supervisor-level exception. the spr numbers for these registers are shown in figure 2-1. the dmiss, imiss, dcmp, icmp, hash1, hash2, and rpa registers are used for software table search operations and should only be accessed when address translation is disabled (that is, msr[ir] = 0 and msr[dr] = 0). for a complete discussion of software table search operations, refer to section 5.5.2, ?mplementation-speci? table search operation. 2.1.2.1 hardware implementation registers (hid0 and hid1) the hid0 and hid1 registers, shown in figure 2-2 and figure 2-3 respectively, de?e enable bits for various 603e-speci? features. figure 2-2. hardware implementation register 0 (hid0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 26 27 28 30 31 reserved ebd eba par nap dpm nhr ice dce dcfi emcp sbclk eice eclk doze sleep riseg ilock dlock icfi fbiob noopti 0 0 0 0 0 0 0 0 0 0 0
2-8 mpc603e & EC603E risc microprocessors user's manual motorola table 2-2 shows the bit de?itions for hid0. table 2-2. hid0 bit settings bit(s) name description 0 emcp enable machine check pin 1 reserved 2 eba enable bus address parity checking 3 ebd enable bus data parity checking 4 sbclk select bus clock for test clock pin 5 eice enable ice outputs?ipeline tracking support 6 eclk enable external test clock pin 7 par disable precharge of ar tr y and shared signals 8 doze doze mode?ll, time base, and snooping active 1 9 nap nap mode?ll and time base active 1 10 sleep sleep mode?o external clock required 1 11 dpm enable dynamic power management 1 12 riseg reserved for test 13?4 reserved 15 nhr reserved 16 ice instruction cache enable 2 17 dce data cache enable 2 18 ilock instruction cache lock 2 19 dlock data cache lock 2 20 icfi instruction cache ?sh invalidate 2 21 dcfi data cache ?sh invalidate 2 22?3 reserved 24 ifem instruction fetch enable m (pid7v-603e only) 25?6 reserved 27 fbiob force branch indirect on bus 28 abe address broadcast enable 2 (pid7v-603e only) 29?0 reserved 31 noopti no-op touch instructions notes: 1. see chapter 9, ?ower management, for more information. 2. see chapter 3, ?nstruction and data cache operation, for more information.
motorola chapter 2. programming model 2-9 figure 2-3. hardware implementation register 1 (hid1) table 2-3 shows the bit de?itions for hid1. 2.1.2.2 data and instruction tlb miss address registers (dmiss and imiss) the dmiss and imiss registers have the same format as shown in figure 2-4. they are loaded automatically upon a data or instruction tlb miss. the dmiss and imiss contain the effective page address of the access that caused the tlb miss exception. the contents are used by the 603e when calculating the values of hash1 and hash2, and by the tlbld and tlbli instructions when loading a new tlb entry. note that the 603e always loads the dmiss register with a big-endian address, even when msr[le] is set. these registers are read and write to the software. figure 2-4. dmiss and imiss registers 2.1.2.3 data and instruction tlb compare registers (dcmp and icmp) the dcmp and icmp registers are shown in figure 2-5. these registers contain the ?st word in the required pte. the contents are constructed automatically from the contents of the segment registers and the effective address (dmiss or imiss) when a tlb miss exception occurs. each pte read from the tables during the table search process should be compared with this value to determine whether or not the pte is a match. upon execution of a tlbld or tlbli instruction the upper 25 bits of the dcmp or icmp register and 11 bits table 2-3. hid1 bit settings bit(s) name description 0 pc0 pll con?uration bit 0 (read-only) 1 pc1 pll con?uration bit 1 (read-only) 2 pc2 pll con?uration bit 2 (read-only) 3 pc3 pll con?uration bit 3 (read-only) 4?1 reserved note: the clock con?uration bits re?ct the state of the pll_cfg[0?] signals. 0123 4 31 reserved pc3 pc0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 pc1 pc2 0 31 effective page address
2-10 mpc603e & EC603E risc microprocessors user's manual motorola of the effective address operand are loaded into the ?st word of the selected tlb entry. these registers are read and write to the software. figure 2-5. dcmp and icmp registers table 2-4 describes the bit settings for the dcmp and icmp registers. 2.1.2.4 primary and secondary hash address registers (hash1 and hash2) the hash1 and hash2 registers contain the physical addresses of the primary and secondary ptegs for the access that caused the tlb miss exception. for convenience, the 603e automatically constructs the full physical address by routing bits 0? of sdr1 into hash1 and hash2 and clearing the lower 6 bits. these registers are read-only and are constructed from the contents of the dmiss or imiss register (the register choice is determined by which miss was last acknowledged). the format for the hash1 and hash2 registers is shown in figure 2-6. figure 2-6. hash1 and hash2 registers table 2-5 describes the bit settings of the hash1 and hash2 registers. table 2-4. dcmp and icmp bit settings bits name description 0 v valid bit. set by the processor on a tlb miss exception. 1?4 vsid virtual segment id. copied from vsid ?ld of corresponding segment register. 25 reserved 26?1 api abbreviated page index. copied from api of effective address. table 2-5. hash1 and hash2 bit settings bits name description 0? htaborg[0?] copy of the upper 7 bits of the htaborg ?ld from sdr1 7?5 hashed page address address bits 7?5 of the pteg to be searched 26?1 reserved 01 24 25 26 31 v vsid api 0 reserved 067 25 26 31 htaborg[0?] hashed page address 0 0 0 0 0 0
motorola chapter 2. programming model 2-11 2.1.2.5 required physical address register (rpa) the rpa register is shown in figure 2-7. during a page table search operation, the software must load the rpa with the second word of the correct pte. when the tlbld or tlbli instruction is executed, the contents of the rpa register and the dmiss or imiss register are merged and loaded into the selected tlb entry. the referenced (r) bit is ignored when the write occurs (no location exists in the tlb entry for this bit). the rpa register is read and write to the software. figure 2-7. required physical address register (rpa) table 2-6 describes the bit settings of the rpa register. 2.1.2.6 instruction address breakpoint register (iabr) the iabr, shown in figure 2-8, controls the instruction address breakpoint exception. iabr[cea] holds an effective address to which each instruction is compared. the exception is enabled by setting bit 30 of iabr. the exception is taken when there is an instruction address breakpoint match on the next instruction to complete. the instruction tagged with the match will not be completed before the breakpoint exception is taken. figure 2-8. instruction address breakpoint register (iabr) table 2-6. rpa bit settings bits name description 0?9 rpn physical page number from pte 20?2 reserved 23 r referenced bit from pte 24 c changed bit from pte 25?8 wimg memory/cache access attribute bits 29 reserved 30?1 pp page protection bits from pte 0 19 20 22 23 24 25 28 29 30 31 reserved rpn r c wimg pp 0 0 0 0 0 29 30 31 reserved cea ie 0
2-12 mpc603e & EC603E risc microprocessors user's manual motorola the bits in the iabr are de?ed as shown in table 2-7. 2.1.2.7 run_n counter register (run_n) the 33-bit run_n counter register is unique to the pid7v-603e. the run_n counter is used by the cop to control the number of processor cycles that the processor runs before halting. the most-signi?ant 32 bits form a 32-bit counter. the function of the least-signi?ant bit remains unchanged. 2.2 operand conventions this section describes the operand conventions as they are represented in two levels of the powerpc architecture. it also provides detailed descriptions of conventions used for storing values in registers and memory, accessing the 603es registers, and representation of data in these registers. 2.2.1 floating-point execution models?isa note that the ?ating-point execution models are not supported on the EC603E microprocessor. the ieee 754 standard includes 64- and 32-bit arithmetic. the standard requires that single-precision arithmetic be provided for single-precision operands. the standard permits double-precision arithmetic instructions to have either (or both) single-precision or double- precision operands, but states that single-precision arithmetic instructions should not accept double-precision operands. the powerpc uisa follows these guidelines: double-precision arithmetic instructions may have single-precision operands but always produce double-precision results. single-precision arithmetic instructions require all operands to be single-precision and always produce single-precision results. for arithmetic instructions, conversions from double- to single-precision must be done explicitly by software, while conversions from single- to double-precision are done implicitly. all powerpc implementations provide the equivalent of the following execution models to ensure that identical results are obtained. the de?ition of the arithmetic instructions for table 2-7. instruction address breakpoint register bit settings bit description 0?9 word address to be compared 30 iabr enabled. setting this bit indicates that the iabr exception is enabled. 31 reserved
motorola chapter 2. programming model 2-13 in?ities, denormalized numbers, and nans follow conventions described in the following sections. although the double-precision format speci?s an 11-bit exponent, exponent arithmetic uses two additional bit positions to avoid potential transient over?w conditions. an extra bit is required when denormalized double-precision numbers are prenormalized. a second bit is required to permit computation of the adjusted exponent value in the following examples when the corresponding exception enable bit is 1: underflow during multiplication using a denormalized factor over?w during division using a denormalized divisor 2.2.2 data organization in memory and data transfers bytes in memory are numbered consecutively starting with 0. each number is the address of the corresponding byte. memory operands may be bytes, half words, words, or double words, or, for the load/store multiple and move assist instructions, a sequence of bytes or words. the address of a memory operand is the address of its ?st byte (that is, of its lowest-numbered byte). operand length is implicit for each instruction. 2.2.3 alignment and misaligned accesses the operand of a single-register memory access instruction has a natural alignment boundary equal to the operand length. in other words, the ?atural?address of an operand is an integral multiple of the operand length. a memory operand is said to be aligned if it is aligned at its natural boundary; otherwise it is misaligned. operands for single-register memory access instructions have the characteristics shown in table 2-8. (although not permitted as memory operands, quad words are shown because quad-word alignment is desirable for certain memory operands.) table 2-8. memory operands operand length addr[28?1] if aligned byte 8 bits xxxx half word 2 bytes xxx0 word 4 bytes xx00 double word 8 bytes x000 quad word 16 bytes 0000 note : an ? in an address bit position indicates that the bit can be 0 or 1 independent of the state of other bits in the address.
2-14 mpc603e & EC603E risc microprocessors user's manual motorola the concept of alignment is also applied more generally to data in memory. for example, a 12-byte data item is said to be word-aligned if its address is a multiple of four. implementation notes ?he following describes how the 603e handles alignment and misaligned accesses: the 603e provides hardware support for some misaligned memory accesses. however, misaligned accesses will suffer a performance degradation compared to aligned accesses of the same type. the 603e does not provide hardware support for ?ating-point load/store operations that are not word-aligned. in such a case, the 603e will invoke an alignment exception and the exception handler must break up the misaligned access. for this reason, ?ating-point single- and double-word accesses should always be word- aligned. note that a ?ating-point double-word access on a word-aligned boundary requires an extra cycle to complete. (floating-point operations are not supported on the EC603E microprocessor.) any memory access that crosses an alignment boundary must be broken into multiple discrete accesses. this includes half-word, word, double-word, and string references. for the case of string accesses, the hardware makes no attempt to get aligned in an effort to reduce the number of discrete accesses. (multiword accesses are architecturally required to be aligned.) the resulting performance degradation depends upon how well each individual access behaves with respect to the memory hierarchy. at a minimum, additional cache access cycles are required. more dramatically, for the case of access to a noncacheable page, each discrete access involves an individual bus operation which will reduce the effective bandwidth of the bus. the frequent use of misaligned accesses is discouraged since they can compromise the overall performance of the processor. 2.2.4 floating-point operand the 603e provides hardware support for all single- and double-precision ?ating-point operations (not supported on the EC603E microprocessor) for most value representations and all rounding modes. the powerpc architecture provides for hardware to implement a ?ating-point system as de?ed in ansi/ieee standard 754-1985, ieee standard for binary floating point arithmetic . for detailed information about the ?ating-point execution model refer to chapter 3, ?perand conventions,?in the programming environments manual . 2.2.5 effect of operand placement on performance the vea states that the placement (location and alignment) of operands in memory affect the relative performance of memory accesses. the best performance is guaranteed if memory operands are aligned on natural boundaries. to obtain the best performance from the 603e, the programmer should assume the performance model described in chapter 3, ?perand conventions,?in the programming environments manual .
motorola chapter 2. programming model 2-15 2.3 instruction set summary this section describes instructions and addressing modes de?ed for the 603e. these instructions are divided into the following functional categories: integer instructions?hese include arithmetic and logical instructions. for more information, see section 2.3.4.1, ?nteger instructions. floating-point instructions?hese include ?ating-point arithmetic instructions, as well as instructions that affect the ?ating-point status and control register (fpscr). for more information, see section 2.3.4.2, ?loating-point instructions.?(note that ?ating-point operations are not supported on the EC603E microprocessor) load and store instructions?hese include integer and ?ating-point load and store instructions. for more information, see section 2.3.4.3, ?oad and store instructions. flow control instructions?hese include branching instructions, condition register logical instructions, and other instructions that affect the instruction ?w. for more information, see section 2.3.4.4, ?ranch and flow control instructions. trap instructions?hese instructions are used to test for a speci?d set of conditions; see section 2.3.4.5, ?rap instructions,?for more information. processor control instructions?hese instructions are used for synchronizing memory accesses and managing caches, tlbs, and segment registers. for more information, see sections 2.3.4.6, 2.3.5.1, and 2.3.6.2. memory synchronization instructions?hese instructions are used for memory synchronizing. see sections 2.3.4.7 and section 2.3.5.2 for more information. memory control instructions?hese instructions provide control of caches, tlbs, and segment registers. for more information, see sections 2.3.5.3 and 2.3.6.3. system linkage instructions?or more information, see section 2.3.6.1, ?ystem linkage instructions. external control instructions?hese include instructions for use with special input/ output devices. for more information, see section 2.3.5.4, ?xternal control instructions. note that this grouping of instructions does not necessarily indicate the execution unit that processes a particular instruction or group of instructions. this information, which is useful in taking full advantage of the 603es superscalar parallel instruction execution, is provided in chapter 8, ?nstruction set,?in the programming environments manual . integer instructions operate on word operands. floating-point instructions operate on single-precision and double-precision floating-point operands. the powerpc architecture uses instructions that are four bytes long and word-aligned. it provides for byte, half-word, and word operand loads and stores between memory and a set of 32 general-purpose registers (gprs). it also provides for word and double-word operand loads and stores between memory and a set of 32 ?ating-point registers (fprs).
2-16 mpc603e & EC603E risc microprocessors user's manual motorola arithmetic and logical instructions do not read or modify memory. to use the contents of a memory location in a computation and then modify the same or another memory location, the memory contents must be loaded into a register, modi?d, and then written to the target location using load and store instructions. the description of each instruction includes the mnemonic and a formatted list of operands. to simplify assembly language programming, a set of simpli?d mnemonics (extended mnemonics in the architecture speci?ation) and symbols is provided for some of the frequently-used instructions; see appendix f, ?impli?d mnemonics,?in the programming environments manual for a complete list of simpli?d mnemonic examples. 2.3.1 classes of instructions the 603e instructions belong to one of the following three classes: defined illegal reserved note that while the de?itions of these terms are consistent among the powerpc processors, the assignment of these classi?ations is not. for example, an instruction that is speci? to 64-bit implementations is considered de?ed for 64-bit implementations but illegal for 32-bit implementations such as the 603e. the class is determined by examining the primary opcode and the extended opcode, if any. if the opcode, or combination of opcode and extended opcode, is not that of a de?ed instruction or of a reserved instruction, the instruction is illegal. in future versions of the powerpc architecture, instruction codings that are now illegal may become assigned to instructions in the architecture, or may be reserved by being assigned to processor-speci? instructions. 2.3.1.1 de?ition of boundedly unde?ed if instructions are encoded with incorrectly set bits in reserved ?lds, the results on execution can be said to be boundedly unde?ed. if a user-level program executes the incorrectly coded instruction, the resulting unde?ed results are bounded in that a spurious change from user to supervisor state is not allowed, and the level of privilege exercised by the program in relation to memory access and other system resources cannot be exceeded. boundedly unde?ed results for a given instruction may vary between implementations, and between execution attempts in the same implementation. 2.3.1.2 de?ed instruction class de?ed instructions are guaranteed to be supported in all powerpc implementations, except as stated in the instruction descriptions in chapter 8, ?nstruction set,?in the programming environments manual . the 603e provides hardware support for all
motorola chapter 2. programming model 2-17 instructions de?ed for 32-bit implementations (the EC603E microprocessor supports all 32-bit instructions with the exception of those de?ed for ?ating-point operations). a powerpc processor invokes the illegal instruction error handler (part of the program exception) when the unimplemented powerpc instructions are encountered so they may be emulated in software, as required. a de?ed instruction can have invalid forms, as described in the following subsection. 2.3.1.3 illegal instruction class illegal instructions can be grouped into the following categories: instructions that are not implemented in the powerpc architecture. these opcodes are available for future extensions of the powerpc architecture; that is, future versions of the powerpc architecture may define any of these instructions to perform new functions. the following primary opcodes are de?ed as illegal but may be used in future extensions to the architecture: 1, 4, 5, 6, 9, 22, 56, 57, 60, 61 instructions that are implemented in the powerpc architecture but are not implemented in a specific powerpc implementation. for example, instructions that can be executed on 64-bit powerpc processors are considered illegal by 32-bit processors. the following primary opcodes are defined for 64-bit implementations only and are illegal on the 603e: 2, 30, 58, 62 all unused extended opcodes are illegal. the unused extended opcodes can be determined from information in section a.2, ?nstructions sorted by opcode,?and section 2.3.1.4, ?eserved instruction class.?notice that extended opcodes for instructions that are defined only for 64-bit implementations are illegal in 32-bit implementations, and vice versa. the following primary opcodes have unused extended opcodes. 17, 19, 31, 59, 63 (primary opcodes 30 and 62 are illegal for all 32-bit implementations, but as 64-bit opcodes they have some unused extended opcodes) an instruction consisting entirely of zeros is guaranteed to be an illegal instruction. this increases the probability that an attempt to execute data or uninitialized memory invokes the system illegal instruction error handler (a program exception). note that if only the primary opcode consists of all zeros, the instruction is considered a reserved instruction. this is further described in section 2.3.1.4, ?eserved instruction class.
2-18 mpc603e & EC603E risc microprocessors user's manual motorola an attempt to execute an illegal instruction invokes the illegal instruction error handler (a program exception) but has no other effect. see section 4.5.7, ?rogram exception (0x00700),?for additional information about illegal and invalid instruction exceptions. with the exception of the instruction consisting entirely of binary zeros, the illegal instructions are available for further additions to the powerpc architecture. 2.3.1.4 reserved instruction class reserved instructions are allocated to speci? implementation-dependent purposes not de?ed by the powerpc architecture. an attempt to execute an unimplemented reserved instruction invokes the illegal instruction error handler (a program exception). see section 4.5.7, ?rogram exception (0x00700),?for additional information about illegal and invalid instruction exceptions. the following types of instructions are included in this class: implementation-speci? instructions (for example, load data tlb entry ( tlbld ) and load instruction tlb entry ( tlbli ) instructions) optional instructions de?ed by the powerpc architecture but not implemented by the 603e (for example, floating square root ( fsqrt ) and floating square root single ( fsqrts ) instructions) 2.3.2 addressing modes this section provides an overview of conventions for addressing memory and for calculating effective addresses as de?ed by the powerpc architecture for 32-bit implementations. for more detailed information, see ?onventions,?in chapter 4, addressing modes and instruction set summary, of the programming environments manual . 2.3.2.1 memory addressing a program references memory using the effective (logical) address computed by the processor when it executes a memory access or branch instruction or when it fetches the next sequential instruction. 2.3.2.2 memory operands bytes in memory are numbered consecutively starting with zero. each number is the address of the corresponding byte. memory operands may be bytes, half words, words, or double words, or, for the load/store multiple and load/store string instructions, a sequence of bytes or words. the address of a memory operand is the address of its ?st byte (that is, of its lowest-numbered byte). operand length is implicit for each instruction. the powerpc architecture supports both big-endian and little-endian byte ordering. the default byte and bit ordering is big-endian. see ?yte ordering?in chapter 3, ?perand conventions,?in the programming
motorola chapter 2. programming model 2-19 environments manual for more information about big-endian and little-endian byte ordering. the operand of a single-register memory access instruction has a natural alignment boundary equal to the operand length. in other words, the ?atural?address of an operand is an integral multiple of the operand length. a memory operand is said to be aligned if it is aligned at its natural boundary; otherwise it is misaligned. for a detailed discussion about memory operands, see chapter 3, ?perand conventions,?in the programming environments manual . 2.3.2.3 effective address calculation an effective address (ea) is the 32-bit sum computed by the processor when executing a memory access or branch instruction or when fetching the next sequential instruction. for a memory access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address, the memory operand is considered to wrap around from the maximum effective address through effective address 0, as described in the following paragraphs. effective address computations for both data and instruction accesses use 32-bit unsigned binary arithmetic. a carry from bit 0 is ignored. load and store operations have three categories of effective address generation: register indirect with immediate index mode register indirect with index mode register indirect mode refer to section 2.3.4.3.2, ?nteger load and store address generation,?for further discussion of effective address generation for load and store operations. branch instructions have three categories of effective address generation: immediate link register indirect count register indirect refer to section 2.3.4.4.1, ?ranch instruction address calculation,?for further discussion of branch instruction effective address generation. 2.3.2.4 synchronization the sychronization described in this section refers to the state of the processor that is performing the sychronization.
2-20 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.2.4.1 context synchronization the system call ( sc ) and return from interrupt ( r ) instructions perform context synchronization by allowing previously issued instructions to complete before performing a change in context. execution of one of these instructions ensures the following: no higher priority exception exists ( sc ). all previous instructions have completed to a point where they can no longer cause an exception. if a prior memory access instruction causes direct-store error exceptions, the results are guaranteed to be determined before this instruction is executed. previous instructions complete execution in the context (privilege, protection, and address translation) under which they were issued. the instructions following the sc or r? instruction execute in the context established by these instructions. 2.3.2.4.2 execution synchronization an instruction is execution synchronizing if all previously initiated instructions appear to have completed before the instruction is initiated or, in the case of the synchronize ( sync ) and instruction synchronize ( isync ) instructions, before the instruction completes. for example, the move to machine state register ( mtmsr ) instruction is execution synchronizing. it ensures that all preceding instructions have completed execution and will not cause an exception before the instruction executes, but does not ensure subsequent instructions execute in the newly established environment. for example, if the mtmsr sets the msr[pr] bit, unless an isync immediately follows the mtmsr instruction, a privileged instruction could be executed or privileged access could be performed without causing an exception even though the msr[pr] bit indicates user mode. 2.3.2.4.3 instruction-related exceptions there are two kinds of exceptions in the 603e?hose caused directly by the execution of an instruction and those caused by an asynchronous event. either may cause components of the system software to be invoked. exceptions can be caused directly by the execution of an instruction as follows: an attempt to execute an illegal instruction causes the illegal instruction (program exception) handler to be invoked. an attempt by a user-level program to execute the supervisor-level instructions listed below causes the privileged instruction (program exception) handler to be invoked. the 603e provides the following supervisor-level instructions: dcbi , mfmsr , mfspr , mfsr , mfsrin , mtmsr , mtspr , mtsr , mtsrin , r , tlbie , tlbsync, tlbld , and tlbli . note that the privilege level of the mfspr and mtspr instructions depends on the spr encoding. an attempt to access memory that is not available (page fault) causes the isi exception handler to be invoked. an attempt to access memory with an effective address alignment that is invalid for the instruction causes the alignment exception handler to be invoked.
motorola chapter 2. programming model 2-21 the execution of an sc instruction invokes the system call exception handler that permits a program to request the system to perform a service. the execution of a trap instruction invokes the program exception trap handler. the execution of a ?ating-point instruction when ?ating-point instructions are disabled or unavailable invokes the ?ating-point unavailable exception handler. the execution of an instruction that causes a ?ating-point exception while exceptions are enabled in the msr invokes the program exception handler. exceptions caused by asynchronous events are described in chapter 4, ?xceptions. 2.3.3 instruction set overview this section provides a brief overview of the powerpc instructions implemented in the 603e and highlights any special information with respect to how the 603e implements a particular instruction. note that the categories used in this section correspond to those used in chapter 4, addressing modes and instruction set summary, in the programming environments manual . these categorizations are somewhat arbitrary and are provided for the convenience of the programmer and do not necessarily re?ct the powerpc architecture speci?ation. note that some of the instructions have the following optional features: cr update?he dot ( . ) suf? on the mnemonic enables the update of the cr. over?w option?he o suf? indicates that the over?w bit in the xer is enabled. 2.3.4 powerpc uisa instructions the powerpc uisa includes the base user-level instruction set (excluding a few user-level cache control, synchronization, and time base instructions), user-level registers, programming model, data types, and addressing modes. this section discusses the instructions de?ed in the uisa. 2.3.4.1 integer instructions this section describes the integer instructions. these consist of the following: integer arithmetic instructions integer compare instructions integer logical instructions integer rotate and shift instructions integer instructions use the content of the gprs as source operands and place results into gprs, into the xer, and into condition register (cr) ?lds.
2-22 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.4.1.1 integer arithmetic instructions table 2-9 lists the integer arithmetic instructions for the 603e. although there is no subtract immediate instruction, its effect can be achieved by using an addi instruction with the immediate operand negated. simpli?d mnemonics are provided that include this negation. the subf instructions subtract the second operand ( r a) from the third operand ( r b). simpli?d mnemonics are provided in which the third operand is subtracted from the second operand. see appendix f, ?implified mnemonics,?in the programming environments manual for examples. 2.3.4.1.2 integer compare instructions the integer compare instructions algebraically or logically compare the contents of r a with either the uimm operand, the simm operand, or the contents of r b. the comparison is table 2-9. integer arithmetic instructions name mnemonic operand syntax add immediate addi r d ,r a , simm add immediate shifted addis r d ,r a , simm add add (add. addo addo.) r d ,r a ,r b subtract from subf (subf. subfo subfo.) r d ,r a ,r b add immediate carrying addic r d ,r a , simm add immediate carrying and record addic. r d ,r a , simm subtract from immediate carrying sub? r d ,r a , simm add carrying addc (addc. addco addco.) r d ,r a ,r b subtract from carrying subfc (subfc. subfco subfco.) r d ,r a ,r b add extended adde (adde. addeo addeo.) r d ,r a ,r b subtract from extended subfe (subfe. subfeo subfeo.) r d ,r a ,r b add to minus one extended addme (addme. addmeo addmeo.) r d ,r a subtract from minus one extended subfme (subfme. subfmeo subfmeo.) r d ,r a add to zero extended addze (addze. addzeo addzeo.) r d ,r a subtract from zero extended subfze (subfze. subfzeo subfzeo.) r d ,r a negate neg (neg. nego nego.) r d ,r a multiply low immediate mulli r d ,r a , simm multiply low mullw (mullw. mullwo mullwo.) r d ,r a ,r b multiply high word mulhw (mulhw.) r d ,r a ,r b multiply high word unsigned mulhwu (mulhwu.) r d ,r a ,r b divide word divw (divw. divwo divwo.) r d ,r a ,r b divide word unsigned divwu (divwu. divwuo divwuo.) r d ,r a ,r b
motorola chapter 2. programming model 2-23 signed for the cmpi and cmp instructions, and unsigned for the cmpli and cmpl instructions. table 2-10 lists the integer compare instructions. the crf d operand can be omitted if the result of the comparison is to be placed in cr0. otherwise the target cr ?ld must be speci?d in the instruction crf d field. for more information refer to appendix f, ?implified mnemonics,?in the programming environments manual . 2.3.4.1.3 integer logical instructions the logical instructions shown in table 2-11 perform bit-parallel operations. logical instructions with the cr update enabled and instructions andi. and andis. set cr ?ld cr0 to characterize the result of the logical operation. these ?lds are set as if the sign-extended low-order 32 bits of the result were algebraically compared to zero. logical instructions without cr update and the remaining logical instructions do not modify the cr. logical instructions do not affect the xer[so], xer[ov], and xer[ca] bits. for simpli?d mnemonics examples for the integer logical operations see appendix f, ?implified mnemonics,?in the programming environments manual . table 2-10. integer compare instructions name mnemonic operand syntax compare immediate cmpi crf d , l ,r a , simm compare cmp crf d , l ,r a ,r b compare logical immediate cmpli crf d , l ,r a , uimm compare logical cmpl crf d , l ,r a ,r b table 2-11. integer logical instructions name mnemonic operand syntax and immediate andi. r a ,r s , uimm and immediate shifted andis. r a ,r s , uimm or immediate ori r a ,r s , uimm or immediate shifted oris r a ,r s , uimm xor immediate xori r a ,r s , uimm xor immediate shifted xoris r a ,r s , uimm and and (and.) r a ,r s ,r b or or (or.) r a ,r s ,r b xor xor (xor.) r a ,r s ,r b nand nand (nand.) r a ,r s ,r b nor nor (nor.) r a ,r s ,r b
2-24 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.4.1.4 integer rotate and shift instructions rotation operations are performed on data from a gpr, and the result, or a portion of the result, is returned to a gpr. see appendix f, ?implified mnemonics,?in the programming environments manual for a complete list of simpli?d mnemonics that allows simpler coding of often-used functions such as clearing the leftmost or rightmost bits of a register, left justifying or right justifying an arbitrary ?ld, and simple rotates and shifts. integer rotate instructions rotate the contents of a register. the result of the rotation is either inserted into the target register under control of a mask (if a mask bit is 1 the associated bit of the rotated data is placed into the target register, and if the mask bit is 0 the associated bit in the target register is unchanged), or anded with a mask before being placed into the target register. the integer rotate instructions are listed in table 2-12. the integer shift instructions perform left and right shifts. immediate-form logical (unsigned) shift operations are obtained by specifying masks and shift values for certain rotate instructions. simpli?d mnemonics are provided to make coding of such shifts simpler and easier to understand. multiple-precision shifts can be programmed as shown in appendix c, ?ultiple-precision shifts,?in the programming environments manual . equivalent eqv (eqv.) r a ,r s ,r b and with complement andc (andc.) r a ,r s ,r b or with complement orc (orc.) r a ,r s ,r b extend sign byte extsb (extsb.) r a ,r s extend sign half word extsh (extsh.) r a ,r s count leading zeros word cntlzw (cntlzw.) r a ,r s table 2-12. integer rotate instructions name mnemonic operand syntax rotate left word immediate then and with mask rlwinm (rlwinm.) r a ,r s , sh , mb , me rotate left word then and with mask rlwnm (rlwnm.) r a ,r s ,r b , mb , me rotate left word immediate then mask insert rlwimi (rlwimi.) r a ,r s , sh , mb , me table 2-11. integer logical instructions (continued) name mnemonic operand syntax
motorola chapter 2. programming model 2-25 the integer shift instructions are listed in table 2-13. 2.3.4.2 floating-point instructions this section describes the ?ating-point instructions, which include the following: floating-point arithmetic instructions floating-point multiply-add instructions floating-point rounding and conversion instructions floating-point compare instructions floating-point status and control register instructions floating-point move instructions the EC603E microprocessor provides hardware support for all 32-bit powerpc instructions with the exception of ?ating-point instructions, which, when implemented on the EC603E microprocessor, take a ?ating-point unavailable exception. see section 2.3.4.3, ?oad and store instructions,?for information about ?ating-point loads and stores. the powerpc architecture supports a ?ating-point system as de?ed in the ieee 754 standard, but requires software support to conform with that standard. all ?ating-point operations conform to the ieee 754 standard, except if software sets the non-ieee mode bit (ni) in the fpscr; the 603e is in the nondenormalized mode when the ni bit is set in the fpscr. if a denormalized result is produced, a default result of zero is generated. the generated zero has the same sign as the denormalized number. the 603e performs single- and double-precision ?ating-point operations compliant with the ieee-754 ?ating-point standard. implementation note ?ingle-precision denormalized results require two additional processor clock cycles to round. when loading or storing a single-precision denormalized number, the load/store unit may take up to 24 processor clock cycles to convert between the internal double-precision format and the external single-precision format. table 2-13. integer shift instructions name mnemonic operand syntax shift left word slw (slw.) r a ,r s ,r b shift right word srw (srw.) r a ,r s ,r b shift right algebraic word immediate srawi (srawi.) r a ,r s , sh shift right algebraic word sraw (sraw.) r a ,r s ,r b
2-26 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.4.2.1 floating-point arithmetic instructions the ?ating-point arithmetic instructions are listed in table 2-14. (floating-point instructions are not supported on the EC603E microprocessor.) 2.3.4.2.2 floating-point multiply-add instructions these instructions combine multiply and add operations without an intermediate rounding operation. the fractional part of the intermediate product is 106 bits wide, and all 106 bits take part in the add/subtract portion of the instruction. the ?ating-point multiply-add instructions are listed in table 2-15. (floating-point instructions are not supported on the EC603E microprocessor.) table 2-14. floating-point arithmetic instructions name mnemonic operand syntax floating add (double-precision) fadd (fadd.) fr d ,fr a ,fr b floating add single fadds (fadds.) fr d ,fr a ,fr b floating subtract (double-precision) fsub (fsub.) fr d ,fr a ,fr b floating subtract single fsubs (fsubs.) fr d ,fr a ,fr b floating multiply (double-precision) fmul (fmul.) fr d ,fr a ,fr c floating multiply single fmuls (fmuls.) fr d ,fr a ,fr c floating divide (double-precision) fdiv (fdiv.) fr d ,fr a ,fr b floating divide single fdivs (fdivs.) fr d ,fr a ,fr b floating reciprocal estimate single fres (fres.) fr d ,fr b floating reciprocal square root estimate frsqrte (frsqrte.) fr d ,fr b floating select fsel (fsel.) fr d ,fr a ,fr c ,fr b table 2-15. floating-point multiply-add instructions name mnemonic operand syntax floating multiply-add (double-precision) fmadd (fmadd.) fr d ,fr a ,fr c ,fr b floating multiply-add single fmadds (fmadds.) fr d ,fr a ,fr c ,fr b floating multiply-subtract (double-precision) fmsub (fmsub.) fr d ,fr a ,fr c ,fr b floating multiply-subtract single fmsubs (fmsubs.) fr d ,fr a ,fr c ,fr b floating negative multiply-add (double-precision) fnmadd (fnmadd.) fr d ,fr a ,fr c ,fr b floating negative multiply-add single fnmadds (fnmadds.) fr d ,fr a ,fr c ,fr b floating negative multiply-subtract (double- precision) fnmsub (fnmsub.) fr d ,fr a ,fr c ,fr b floating negative multiply-subtract single fnmsubs (fnmsubs). fr d ,fr a ,fr c ,fr b
motorola chapter 2. programming model 2-27 implementation note ?ingle-precision multiply-type instructions operate faster than their double-precision equivalents. see chapter 6, ?nstruction timing,?for more information. 2.3.4.2.3 floating-point rounding and conversion instructions the floating round to single-precision ( frsp ) instruction is used to truncate a 64-bit double-precision number to a 32-bit single-precision ?ating-point number. the ?ating- point conversion instructions convert a 64-bit double-precision ?ating-point number to a 32-bit signed integer number. the powerpc architecture de?es bits 0?1 of ?ating-point register fr d as unde?ed when executing the floating convert to integer word ( fctiw ) and floating convert to integer word with round toward zero ( fctiwz ) instructions. examples of uses of these instructions to perform various conversions can be found in appendix d, ?loating-point models,?in the programming environments manual . the ?ating-point rounding instructions are shown in table 2-16. (floating-point instructions are not supported on the EC603E microprocessor.) 2.3.4.2.4 floating-point compare instructions floating-point compare instructions compare the contents of two ?ating-point registers. the comparison ignores the sign of zero (that is +0 = ?). the ?ating-point compare instructions are listed in table 2-17. (floating-point instructions are not supported on the EC603E microprocessor.) 2.3.4.2.5 floating-point status and control register instructions every fpscr instruction appears to synchronize the effects of all ?ating-point instructions executed by a given processor. executing an fpscr instruction ensures that all ?ating-point instructions previously initiated by the given processor appear to have completed before the fpscr instruction is initiated and that no subsequent ?ating-point instructions appear to be initiated by the given processor until the fpscr instruction has table 2-16. floating-point rounding and conversion instructions name mnemonic operand syntax floating round to single-precision frsp (frsp.) fr d ,fr b floating convert to integer word fctiw (fctiw.) fr d ,fr b floating convert to integer word with round toward zero fctiwz (fctiwz.) fr d ,fr b table 2-17. floating-point compare instructions name mnemonic operand syntax floating compare unordered fcmpu crf d ,fr a ,fr b floating compare ordered fcmpo crf d ,fr a ,fr b
2-28 mpc603e & EC603E risc microprocessors user's manual motorola completed. the fpscr instructions are listed in table 2-18. (floating-point instructions are not supported on the EC603E microprocessor.) implementation note ?he architecture notes that, in some implementations, the move to fpscr fields ( mtfsf x ) instruction may perform more slowly when only a portion of the ?lds are updated as opposed to all of the ?lds. this is not the case in the 603e. 2.3.4.2.6 floating-point move instructions floating-point move instructions copy data from one ?ating-point register to another. the ?ating-point move instructions do not modify the fpscr. the cr update option in these instructions controls the placing of result status into cr1. floating-point move instructions are listed in table 2-18. (floating-point instructions are not supported on the EC603E microprocessor.) 2.3.4.3 load and store instructions load and store instructions are issued and translated in program order; however, the accesses can occur out of order. synchronizing instructions are provided to enforce strict ordering. this section describes the load and store instructions of the 603e, which consist of the following: integer load instructions integer store instructions integer load and store with byte-reverse instructions integer load and store multiple instructions table 2-18. floating-point status and control register instructions name mnemonic operand syntax move from fpscr mffs (mffs.) fr d move to condition register from fpscr mcrfs crf d ,crf s move to fpscr field immediate mtfsfi (mtfs?) crf d , imm move to fpscr fields mtfsf (mtfsf.) fm ,fr b move to fpscr bit 0 mtfsb0 (mtfsb0.) crb d move to fpscr bit 1 mtfsb1 (mtfsb1.) crb d table 2-19. floating-point move instructions name mnemonic operand syntax floating move register fmr (fmr.) fr d ,fr b floating negate fneg (fneg.) fr d ,fr b floating absolute value fabs (fabs.) fr d ,fr b floating negative absolute value fnabs (fnabs.) fr d ,fr b
motorola chapter 2. programming model 2-29 integer load and store string instructions floating-point load instructions floating-point store instructions 2.3.4.3.1 self-modifying code when a processor modi?s a memory location that may be contained in the instruction cache, software must ensure that memory updates are visible to the instruction fetching mechanism. this can be achieved by the following instruction sequence: dcbst |update memory sync |wait for update icbi |remove (invalidate) copy in instruction cache isync |remove copy in own instruction buffer these operations are required because the data cache is a write-back cache. since instruction fetching bypasses the data cache, changes to items in the data cache may not be re?cted in memory until the fetch operations complete. special care must be taken to avoid coherency paradoxes in systems that implement uni?d secondary caches, and designers should carefully follow the guidelines for maintaining cache coherency that are provided in the vea, and discussed in chapter 5, ?ache model and memory coherency,?in the programming environments manual . because the 603e does not broadcast the m bit for instruction fetches, external caches are subject to coherency paradoxes. 2.3.4.3.2 integer load and store address generation integer load and store operations generate effective addresses using register indirect with immediate index mode, register indirect with index mode, or register indirect mode. see section 2.3.2.3, ?ffective address calculation,?for information about calculating effective addresses. note that the 603e is optimized for load and store operations that are aligned on natural boundaries, and operations that are not naturally aligned may suffer performance degradation. refer to section 4.5.6.1, ?nteger alignment exceptions,?for additional information about load and store address alignment exceptions. 2.3.4.3.3 register indirect integer load instructions for integer load instructions, the byte, half word, word, or double word addressed by the ea is loaded into r d. many integer load instructions have an update form, in which r a is updated with the generated effective address. for these forms, the ea is placed into r a and the memory element (byte, half word, word, or double word) addressed by ea is loaded into r d. implementation note ?n some implementations of the powerpc architecture, the load half word algebraic instructions ( lha and lhax ) and the load with update ( lbzu , lbzux , lhzu , lhzux , lhau , lhaux , lwu , and lwux ) instructions may execute with greater latency than other types of load instructions. in the 603e, these instructions operate with the same latency as other load instructions.
2-30 mpc603e & EC603E risc microprocessors user's manual motorola table 2-20 lists the integer load instructions. 2.3.4.3.4 integer store instructions for integer store instructions, the contents of r s are stored into the byte, half word, word, or double word in memory addressed by the effective address (ea). many store instructions have an update form, in which r a is updated with the ea. for these forms, the following rules apply: ?f r a 1 0, the ea is placed into r a. if r s = r a, the contents of r s are copied to the target memory element, then the generated ea is placed into r a ( r s). the 603e defines store with update instructions with r a = 0 and integer store instructions with the cr update option enabled (rc ?ld, bit 31, in the instruction encoding = 1) to be invalid forms. table 2-21 provides a list of the integer store instructions for the 603e. table 2-20. integer load instructions name mnemonic operand syntax load byte and zero lbz r d , d( r a) load byte and zero indexed lbzx r d ,r a ,r b load byte and zero with update lbzu r d , d( r a) load byte and zero with update indexed lbzux r d ,r a ,r b load half word and zero lhz r d , d( r a) load half word and zero indexed lhzx r d ,r a ,r b load half word and zero with update lhzu r d , d( r a) load half word and zero with update indexed lhzux r d ,r a ,r b load half word algebraic lha r d , d( r a) load half word algebraic indexed lhax r d ,r a ,r b load half word algebraic with update lhau r d , d( r a) load half word algebraic with update indexed lhaux r d ,r a ,r b load word and zero lwz r d , d( r a) load word and zero indexed lwzx r d ,r a ,r b load word and zero with update lwzu r d , d( r a) load word and zero with update indexed lwzux r d ,r a ,r b
motorola chapter 2. programming model 2-31 2.3.4.3.5 integer load and store with byte-reverse instructions table 2-22 describes integer load and store with byte-reverse instructions. when used in a powerpc system operating with the default big-endian byte order, these instructions have the effect of loading and storing data in little-endian order. likewise, when used in a powerpc system operating with little-endian byte order, these instructions have the effect of loading and storing data in big-endian order. for more information about big-endian and little-endian byte ordering, see ?yte ordering?in chapter 3, ?perand conventions,?in the programming environments manual . implementation note ?n some powerpc implementations, load byte-reverse instructions ( lhbrx and lwbrx ) may have greater latency than other load instructions; however, these instructions operate with the same latency as other load instructions in the 603e. table 2-21. integer store instructions name mnemonic operand syntax store byte stb r s , d( r a) store byte indexed stbx r s ,r a ,r b store byte with update stbu r s , d( r a) store byte with update indexed stbux r s ,r a ,r b store half word sth r s , d( r a) store half word indexed sthx r s ,r a ,r b store half word with update sthu r s , d( r a) store half word with update indexed sthux r s ,r a ,r b store word stw r s , d( r a) store word indexed stwx r s ,r a ,r b store word with update stwu r s , d( r a) store word with update indexed stwux r s ,r a ,r b table 2-22. integer load and store with byte-reverse instructions name mnemonic operand syntax load half word byte-reverse indexed lhbrx r d ,r a ,r b load word byte-reverse indexed lwbrx r d ,r a ,r b store half word byte-reverse indexed sthbrx r s ,r a ,r b store word byte-reverse indexed stwbrx r s ,r a ,r b
2-32 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.4.3.6 integer load and store multiple instructions the integer load/store multiple instructions are used to move blocks of data to and from the gprs. in some implementations, these instructions are likely to have greater latency and take longer to execute, perhaps much longer, than a sequence of individual load or store instructions that produce the same results. implementation notes ?he following describes the 603e implementation of the load/ store multiple instruction: the load multiple and store multiple instructions may have operands that require memory accesses crossing a 4-kbyte page boundary. as a result, these instructions may be interrupted by a dsi exception associated with the address translation of the second page. in this case, the 603e performs some or all of the memory references from the ?st page, and none of the memory references from the second page before taking the exception. on return from the dsi exception, the load or store multiple instruction will re-execute from the beginning. for additional information, refer to ?si exception (0x00300)?in chapter 6, ?xceptions,?in the programming environments manual . the powerpc architecture de?es the load multiple word ( lmw ) instruction with r a in the range of registers to be loaded as an invalid form. it de?es the load multiple and store multiple instructions with misaligned operands (that is, the ea is not a multiple of 4) to cause an alignment exception. the 603e de?es the load multiple word ( lmw ) instruction with r a in the range of registers to be loaded as an invalid form. the powerpc architecture describes some preferred instruction forms for the integer load and store multiple instructions that may perform better than other forms in some implementations. none of these preferred forms have an effect on instruction performance in the 603e. when the 603e is operating with little-endian byte order, execution of a load or store multiple instruction causes the system alignment error handler to be invoked; see ?yte ordering?in chapter 3, ?perand conventions,?in the programming environments manual for more information. table 2-23 lists the integer load and store multiple instructions for the 603e. table 2-23. integer load and store multiple instructions name mnemonic operand syntax load multiple word lmw r d , d( r a) store multiple word stmw r s , d( r a)
motorola chapter 2. programming model 2-33 2.3.4.3.7 integer load and store string instructions the integer load and store string instructions allow movement of data from memory to registers or from registers to memory without concern for alignment. these instructions can be used for a short move between arbitrary memory locations or to initiate a long move between misaligned memory ?lds. when the 603e is operating with little-endian byte order, execution of a load or store string instruction causes the system alignment error handler to be invoked; see ?yte ordering in chapter 3, ?perand conventions,?in the programming environments manual for more information. table 2-24 lists the integer load and store string instructions. load string and store string instructions may involve operands that are not word-aligned. as described in alignment exception (0x00600)?in chapter 6, ?xceptions,?in the programming environments manual, a misaligned string operation suffers a performance penalty compared to a word-aligned operation of the same type. when a string operation crosses a 4-kbyte boundary, the instruction may be interrupted by a dsi exception associated with the address translation of the second page. in this case, the 603e performs some or all memory references from the ?st page and none from the second before taking the exception. on return from the dsi exception, the load or store string instruction will re-execute from the beginning. for more information, refer to ?si exception (0x00300)?in chapter 6, ?xceptions,?in the programming environments manual . implementation note ?f r a is in the range of registers to be loaded for a load string word immediate ( lswi ) instruction or if either r a or r b is in the range of registers to be loaded for a load string word indexed ( lswx ) instruction, the powerpc architecture de?es the instruction to be of an invalid form. in addition, the lswx and stswx instructions that specify a string length of zero are de?ed to be invalid by the powerpc architecture. however, neither of these cases holds true for the 603e which treats these cases as valid forms. table 2-24. integer load and store string instructions name mnemonic operand syntax load string word immediate lswi r d ,r a , nb load string word indexed lswx r d ,r a ,r b store string word immediate stswi r s ,r a , nb store string word indexed stswx r s ,r a ,r b
2-34 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.4.3.8 floating-point load and store address generation floating-point load and store operations generate effective addresses using the register indirect with immediate index addressing mode and register indirect with index addressing mode, the details of which are described below. floating-point loads and stores are not supported for direct-store accesses. the use of the ?ating-point load and store operations for direct-store accesses will result in a dsi exception. (note that ?ating-point instructions are not supported on the EC603E microprocessor.) 2.3.4.3.9 floating-point load instructions there are two forms of the ?ating-point load instruction?ingle-precision and double- precision operand formats. because the fprs support only the ?ating-point double- precision format, single-precision floating-point load instructions convert single-precision data to double-precision format before loading the operands into the target fpr. this conversion is described fully in ?loating-point load instructions?in appendix d, ?loating-point models,?in the programming environments manual . implementation note ?he powerpc architecture de?es load with update instructions with r a = 0 as an invalid form; however, the 603e treats this case as a valid form. on the EC603E microprocessor, ?ating-point instructions are trapped by the ?ating-point unavailable exception vector and can be emulated in software. table 2-25 provides a list of the ?ating-point load instructions. (floating-point instructions are not supported on the EC603E microprocessor.) 2.3.4.3.10 floating-point store instructions there are three basic forms of the store instruction?ingle-precision, double-precision, and integer. the integer form is supported by the optional st?x instruction. because the fprs support only ?ating-point, double-precision format for ?ating-point data single- precision ?ating-point store instructions convert double-precision data to single-precision format before storing the operands. the conversion steps are described fully in ?loating- table 2-25. floating-point load instructions name mnemonic operand syntax load floating-point single lfs fr d , d( r a) load floating-point single indexed lfsx fr d ,r a ,r b load floating-point single with update lfsu fr d , d( r a) load floating-point single with update indexed lfsux fr d ,r a ,r b load floating-point double lfd fr d , d( r a) load floating-point double indexed lfdx fr d ,r a ,r b load floating-point double with update lfdu fr d , d( r a) load floating-point double with update indexed lfdux fr d ,r a ,r b
motorola chapter 2. programming model 2-35 point store instructions?in appendix d, ?loating-point models,?in the programming environments manual . implementation note ?he powerpc architecture de?es store with update instructions with r a = 0 as an invalid form; however, the 603e treats this case as valid. on the EC603E microprocessor, ?ating-point instructions are trapped by the ?ating-point unavailable exception vector and can be emulated in software. table 2-26 provides a list of the ?ating-point store instructions. (floating-point instructions are not supported on the EC603E microprocessor.) 2.3.4.4 branch and flow control instructions branch instructions are executed by the branch processing unit (bpu). the bpu receives branch instructions from the fetch unit and performs condition register (cr) look-ahead operations on conditional branches to resolve them early, achieving the effect of a zero- cycle branch in many cases. some branch instructions can redirect instruction execution conditionally based on the value of bits in the cr. when the branch processor encounters one of these instructions, it scans the execution pipelines to determine whether an instruction in progress may affect the particular cr bit. if no interlock is found, the branch can be resolved immediately by checking the bit in the cr and taking the action de?ed for the branch instruction. if an interlock is detected, the branch is considered unresolved and the direction of the branch is predicted using static branch prediction as described in ?onditional branch control?in chapter 4, addressing modes and instruction set summary, in the programming environments manual . the interlock is monitored while instructions are fetched for the predicted branch. when the interlock is cleared, the branch processor determines whether the prediction was correct based on the value of the cr bit. if the prediction is correct, the branch is considered completed and instruction fetching continues. table 2-26. floating-point store instructions name mnemonic operand syntax store floating-point single stfs fr s , d( r a) store floating-point single indexed stfsx fr s ,r a ,r b store floating-point single with update stfsu fr s , d( r a) store floating-point single with update indexed stfsux fr s ,r a ,r b store floating-point double stfd fr s , d( r a) store floating-point double indexed stfdx fr s ,r a ,r b store floating-point double with update stfdu fr s , d( r a) store floating-point double with update indexed stfdux fr s ,r a ,r b store floating-point as integer word indexed st?x fr s ,r a ,r b
2-36 mpc603e & EC603E risc microprocessors user's manual motorola if the prediction is incorrect, the fetched instructions are purged, and instruction fetching continues along the alternate path. see chapter 8, ?nstruction timing,?in the programming environments manual for more information about how branches are executed. 2.3.4.4.1 branch instruction address calculation branch instructions can alter the sequence of instruction execution. instruction addresses are always assumed to be word aligned; the processor ignores the two low-order bits of the generated branch target address. branch instructions compute the effective address (ea) of the next instruction address using the following addressing modes: branch relative branch conditional to relative address branch to absolute address branch conditional to absolute address branch conditional to link register branch conditional to count register 2.3.4.4.2 branch instructions table 2-27 lists the branch instructions provided by the powerpc processors. to simplify assembly language programming, a set of simpli?d mnemonics and symbols is provided for the most frequently used forms of branch conditional, compare, trap, rotate and shift, and certain other instructions. see appendix f, ?impli?d mnemonics,?in the programming environments manual for a list of simpli?d mnemonic examples. 2.3.4.4.3 condition register logical instructions condition register logical instructions, shown in table 2-28, and the move condition register field ( mcrf ) instruction are also de?ed as ?w control instructions, although they are executed by the system register unit (sru). most instructions executed by the sru are table 2-27. branch instructions name mnemonic operand syntax branch b (ba bl bla) target_addr branch conditional bc (bca bcl bcla) bo , bi , target_addr branch conditional to link register bclr (bclrl) bo , bi branch conditional to count register bcctr (bcctrl) bo , bi
motorola chapter 2. programming model 2-37 completion-serialized to maintain system state; that is, the instruction is held for execution in the sru until all prior instructions issued have completed. note that if the lr update option is enabled for any of these instructions, these forms of the instructions are invalid in the 603e. 2.3.4.5 trap instructions the trap instructions shown in table 2-29 are provided to test for a speci?d set of conditions. if any of the conditions tested by a trap instruction are met, the system trap handler is invoked. if the tested conditions are not met, instruction execution continues normally. see appendix f, ?impli?d mnemonics,?in the programming environments manual for a complete set of simpli?d mnemonics. 2.3.4.6 processor control instructions processor control instructions are used to read from and write to the condition register (cr), machine state register (msr), and special-purpose registers (sprs), and to read from the time base register (tbu or tbl). table 2-28. condition register logical instructions name mnemonic operand syntax condition register and crand crb d ,crb a ,crb b condition register or cror crb d ,crb a ,crb b condition register xor crxor crb d ,crb a ,crb b condition register nand crnand crb d ,crb a ,crb b condition register nor crnor crb d ,crb a ,crb b condition register equivalent creqv crb d ,crb a ,crb b condition register and with complement crandc crb d ,crb a ,crb b condition register or with complement crorc crb d ,crb a ,crb b move condition register field mcrf crf d ,crf s table 2-29. trap instructions name mnemonic operand syntax trap word immediate twi to ,r a , simm trap word tw to ,r a ,r b
2-38 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.4.6.1 move to/from condition register instructions table 2-37 lists the instructions provided by the 603e for reading from or writing to the cr. 2.3.4.7 memory synchronization instructions?isa memory synchronization instructions control the order in which memory operations are completed with respect to asynchronous events, and the order in which memory operations are seen by other processors or memory access mechanisms. see chapter 3, ?nstruction and data cache operation,?for additional information about these instructions and about related aspects of memory synchronization. the sync instruction delays execution of subsequent instructions until previous instructions have completed to the point that they can no longer cause an exception and until all previous memory accesses are performed globally; the sync operation is not broadcast onto the 603e bus interface. additionally all load and store cache/bus activities initiated by prior instructions are completed. touch load operations ( dcbt and dcbtst ) are required to complete at least through address translation, but not required to complete on the bus. the functions performed by the sync instruction normally take a signi?ant amount of time to complete; as a result, frequent use of this instruction may adversely affect performance. in addition, the number of cycles required to complete a sync instruction depends on system parameters and on the processor's state when the instruction is issued. the proper paired use of the l wa rx and stwcx. instructions allows programmers to emulate common semaphore operations such as ?est and set,??ompare and swap,??xchange memory, and ?etch and add.?examples of these semaphore operations can be found in appendix e, ?ynchronization programming examples,?in the programming environments manual . the lwarx instruction must be paired with an stwcx. instruction with the same effective address used for both instructions of the pair. note that the reservation granularity is 32 bytes. the concept behind the use of the lwarx and stwcx. instructions is that a processor may load a semaphore from memory, compute a result based on the value of the semaphore, and conditionally store it back to the same location (only if that location has not been modi?d since it was ?st read), and determine if the store was successful. the conditional store is performed based upon the existence of a reservation established by the preceding lwarx instruction. if the reservation exists when the store is executed, the store is performed and a bit is set in the cr. if the reservation does not exist when the store is executed, the target memory location is not modi?d and a bit is cleared in the cr. table 2-30. move to/from condition register instructions name mnemonic operand syntax move to condition register fields mtcrf crm ,r s move to condition register from xer mcrxr crf d move from condition register mfcr r d
motorola chapter 2. programming model 2-39 if the store was successful, the sequence of instructions from the read of the semaphore to the store that updated the semaphore appear to have been executed atomically (that is, no other processor or mechanism modi?d the semaphore location between the read and the update), thus providing the equivalent of a real atomic operation. however, in reality, other processors may have read from the location during this operation. in the 603e, the reservations are made on behalf of aligned 32-byte sections of the memory address space. the lwarx and stwcx. instructions require the ea to be aligned. exception handling software should not attempt to emulate a misaligned lwarx or stwcx. instruction, because there is no correct way to de?e the address associated with the reservation. in general, the lwarx and stwcx. instructions should be used only in system programs, which can be invoked by application programs as needed. at most, one reservation exists simultaneously on any processor. the address associated with the reservation can be changed by a subsequent lwarx instruction. the conditional store is performed based upon the existence of a reservation established by the preceding lwarx regardless of whether the address generated by the lwarx matches that generated by the stwcx. instruction. a reservation held by the processor is cleared by one of the following: executing an stwcx. instruction to any address attempt by some other device to modify a location in the reservation granularity (32 bytes) the lwarx and stwcx. instructions in write-through access mode do not cause a dsi exception. table 2-31 lists the uisa memory synchronization instructions for the 603e. 2.3.5 powerpc vea instructions the powerpc vea describes the semantics of the memory model that can be assumed by software processes, and includes descriptions of the cache model, cache-control instructions, address aliasing, and other related issues. 2.3.5.1 processor control instructions in addition to the move to condition register instructions speci?d by the uisa, the vea de?es the move from time base ( mftb ) instruction for reading the contents of the time base register. the mftb is a user-level instruction, it is shown in table 2-32. table 2-31. memory synchronization instructions?isa name mnemonic operand syntax load word and reserve indexed lwarx r d ,r a ,r b store word conditional indexed stwcx. r s ,r a ,r b synchronize sync
2-40 mpc603e & EC603E risc microprocessors user's manual motorola simpli?d mnemonics are provided for the mftb instruction so it can be coded with the tbr name as part of the mnemonic rather than requiring it to be coded as an operand. the mftb instruction serves as both a basic and simpli?d mnemonic. assemblers recognize an mftb mnemonic with two operands as the basic form, and an mftb mnemonic with one operand as the simpli?d form. simpli?d mnemonics are also provided for move from time base upper ( mftbu ), which is a variant of the mftb instruction rather than of mfspr . the 603e ignores the extended opcode differences between mftb and mfspr by ignoring bit 25 of both instructions and treating them both identically. for more information refer to appendix f, ?impli?d mnemonics,?in the programming environments manual . 2.3.5.2 memory synchronization instructions?ea memory synchronization instructions control the order in which memory operations are completed with respect to asynchronous events, and the order in which memory operations are seen by other processors or memory access mechanisms. see chapter 3, ?nstruction and data cache operation,?for additional information about these instructions and about related aspects of memory synchronization. implementation notes the following describes how the 603e handles memory synchronization in the vea. the instruction synchronize ( isync ) instruction causes the 603e to discard all prefetched instructions, wait for any preceding instructions to complete, and then branch to the next sequential instruction (which has the effect of clearing the pipeline behind the isync instruction). the enforce in-order execution of i/o ( eieio ) instruction is used to ensure memory reordering of noncacheable memory access. since the 603e does not reorder noncacheable memory accesses, the eieio instruction is treated as a no-op. table 2-31 lists the vea memory synchronization instructions for the 603e. table 2-32. move from time base instruction name mnemonic operand syntax move from time base mftb r d , tbr table 2-33. memory synchronization instructions?ea name mnemonic operand syntax enforce in-order execution of i/o eieio instruction synchronize isync
motorola chapter 2. programming model 2-41 2.3.5.3 memory control instructions?ea memory control instructions include the following types: cache management instructions segment register manipulation instructions translation lookaside buffer management instructions this section describes the user-level cache management instructions de?ed by the vea. see section 2.3.6.3, ?emory control instructions?ea,?for information about supervisor-level cache, segment register manipulation, and translation lookaside buffer management instructions. the instructions listed in table 2-34 provide user-level programs the ability to manage on- chip caches when they exist. as with other memory-related instructions, the effect of the cache management instructions on memory are weakly ordered. if the programmer needs to ensure that cache or other instructions have been performed with respect to all other processors and system mechanisms, a sync instruction must be placed in the program following those instructions. note that when data address translation is disabled (msr[dr] = 0), the data cache block set to zero ( dcbz ) instruction allocates a cache block in the cache and may not verify that the physical address is valid. if a cache block is created for an invalid physical address, a machine check condition may result when an attempt is made to write that cache block back to memory. the cache block could be written back as a result of the execution of an instruction that causes a cache miss and the invalid addressed cache block is the target for replacement or a data cache block store ( dcbst ) instruction. note that any cache control instruction that generates an effective address that corresponds to a direct-store segment (sr[t] = 1) is treated as a no-op. table 2-34 lists the cache instructions that are accessible to user-level programs. table 2-34. user-level cache instructions name mnemonic operand syntax data cache block touch dcbt r a ,r b data cache block touch for store dcbtst r a ,r b data cache block set to zero dcbz r a ,r b data cache block store dcbst r a ,r b data cache block flush dcbf r a ,r b instruction cache block invalidate icbi r a ,r b
2-42 mpc603e & EC603E risc microprocessors user's manual motorola 2.3.5.4 external control instructions the external control instructions allow a user-level program to communicate with a special- purpose device. the mmu translation of the ea is not used to select the special-purpose device, as it is used in most instructions such as loads and stores. the ea is used instead as an address operand that is passed to the device over the address bus. four other signals (the burst and size signals on the 60x bus) are used to select the device; these four signals output the 4-bit resource id (rid) ?ld that is located in the ear register. executing these instructions when msr[dr] = 0 causes a programming error, and the physical address on the bus is unde?ed. executing these instructions to a direct-store segment causes a dsi exception. the external control instructions are listed in table 2-35. 2.3.6 powerpc oea instructions the powerpc oea includes the structure of the memory management model, supervisor- level registers, and the exception model. 2.3.6.1 system linkage instructions this section describes the system linkage instructions (see table 2-36). the sc instruction is a user-level instruction that permits a user program to call on the system to perform a service and causes the processor to take an exception. the return from interrupt ( r ) instruction is a supervisor-level instruction that is useful for returning from an exception handler. 2.3.6.2 processor control instructions?ea processor control instructions are used to read from and write to the condition register (cr), machine state register (msr), and special-purpose registers (sprs), and to read from the time base register (tbu or tbl). table 2-35. external control instructions name mnemonic operand syntax external control in word indexed eciwx r d ,r a ,r b external control out word indexed ecowx r s ,r a ,r b table 2-36. system linkage instructions name mnemonic operand syntax system call sc return from interrupt r
motorola chapter 2. programming model 2-43 2.3.6.2.1 move to/from machine state register instructions table 2-37 lists the instructions provided by the 603e for reading from or writing to the msr. 2.3.6.2.2 move to/from special-purpose register instructions simpli?d mnemonics are provided for the mtspr and mfspr instructions so they can be coded with the spr name as part of the mnemonic rather than as a numeric operand. see appendix f, ?impli?d mnemonics,?in the programming environments manual for simpli?d mnemonic examples. the mtspr and mfspr instructions are shown in table 2-38. for mtspr and mfspr instructions, the spr number coded in assembly language does not appear directly as a 10-bit binary number in the instruction. the number coded is split into two 5-bit halves that are reversed in the instruction encoding, with the high-order 5 bits appearing in bits 16?0 of the instruction encoding and the low-order 5 bits in bits 11?5. if the spr ?ld contains any value other than one of the values shown in table 2-39, either the program exception handler is invoked or the results are boundedly unde?ed. table 2-37. move to/from machine state register instructions name mnemonic operand syntax move to machine state register mtmsr r s move from machine state register mfmsr r d table 2-38. move to/from special-purpose register instructions name mnemonic operand syntax move to special-purpose register mtspr spr ,r s move from special-purpose register mfspr r d , spr table 2-39. implementation-specific spr encodings (mfspr) spr* register name decimal spr[5?] spr[0?] 976 11110 10000 dmiss 977 11110 10001 dcmp 978 11110 10010 hash1 979 11110 10011 hash2 980 11110 10100 imiss 981 11110 10101 icmp
2-44 mpc603e & EC603E risc microprocessors user's manual motorola implementation note ?he 603e ignores the extended opcode differences between mftb and mfspr by ignoring tb[25] and treating both instructions identically. 2.3.6.3 memory control instructions?ea this section describes memory control instructions, which include the following types: cache management instructions segment register manipulation instructions translation lookaside buffer management instructions 2.3.6.3.1 supervisor-level cache management instruction table 2-40 lists the only supervisor-level cache management instruction. see section 2.3.5.3, ?emory control instructions?ea,?for a description of cache instructions that provide user-level programs the ability to manage the on-chip caches. if the effective address references a direct-store segment, the instruction is treated as a no-op. when data translation is disabled, msr[dr] = 0, the dcbz instruction establishes a block in the cache and may not verify that the physical address is valid. if a block is created for an invalid real address, a machine check exception may result when an attempt is made to write that block back to memory. the block could be written back as the result of the execution of an instruction that causes a cache miss and the invalid address block is the target for replacement or as the result of a dcbst instruction. 982 11110 10110 rpa 1008 11111 10000 hid0 1009 11111 10001 hid1 1010 11111 10010 iabr * note that the order of the two 5-bit halves of the spr number is reversed compared with actual instruction coding. for mtspr and mfspr instructions, the spr number coded in assembly language does not appear directly as a 10-bit binary number in the instruction. the number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16?0 of the instruction and the low-order 5 bits in bits 11?5. table 2-40. supervisor-level cache management instruction name mnemonic operand syntax data cache block invalidate dcbi r a ,r b table 2-39. implementation-specific spr encodings (mfspr) (continued) spr* register name decimal spr[5?] spr[0?]
motorola chapter 2. programming model 2-45 2.3.6.3.2 segment register manipulation instructions the instructions listed in table 2-41 provide access to the segment registers for the 603e. these instructions operate completely independently of the msr[ir] and msr[dr] bit settings. refer to ?ynchronization requirements for special registers and tlbs?in chapter 2, ?egister set,?in the programming environments manual for serialization requirements and other recommended precautions to observe when manipulating the segment registers. 2.3.6.3.3 translation lookaside buffer management instructions the address translation mechanism is de?ed in terms of segment descriptors and page table entries (ptes) used by powerpc processors to locate the effective-to-physical address mapping for a particular access. the ptes reside in page tables in memory. as de?ed for 32-bit implementations by the powerpc architecture, segment descriptors reside in 16 on- chip segment registers. implementation note ?he 603e provides the ability to invalidate a tlb entry. the tlb invalidate entry ( tlbie ) instruction invalidates the tlb entry indexed by the ea, and operates on both the instruction and data tlbs simultaneously invalidating four tlb entries (both sets in each tlb). the index corresponds to bits 15?9 of the ea. to invalidate all entries within both tlbs, 32 tlbie instructions should be issued, incrementing this ?ld by one each time. the 603e provides two implementation-speci? instructions ( tlbld and tlbli ) that are used by software table search operations following tlb misses to load tlb entries on-chip. for more information on tlbld and tlbli refer to section 2.3.8, ?mplementation-speci? instructions. note that the tlbia instruction is not implemented on the 603e. table 2-41. segment register manipulation instructions name mnemonic operand syntax move to segment register mtsr sr ,r s move to segment register indirect mtsrin r s ,r b move from segment register mfsr r d , sr move from segment register indirect mfsrin r d ,r b
2-46 mpc603e & EC603E risc microprocessors user's manual motorola refer to chapter 5, ?emory management?for more information about the tlb operations for the 603e. table 2-42 lists the tlb instructions. because the presence and exact semantics of the translation lookaside buffer management instructions is implementation-dependent, system software should incorporate uses of the instructions into subroutines to maximize compatibility with programs written for other processors. for more information on the powerpc instruction set, refer to chapter 4, addressing modes and instruction set summary, and chapter 8, ?nstruction set,?in the programming environments manual . 2.3.7 recommended simpli?d mnemonics to simplify assembly language programs, a set of simplified mnemonics is provided for some of the most frequently used operations (such as no-op, load immediate, load address, move register, and complement register). powerpc compliant assemblers provide the simplified mnemonics listed in ?ecommended simplified mnemonics?in appendix f, ?implified mnemonics,?in the programming environments manual and listed with some of the instruction descriptions in this chapter. programs written to be portable across the various assemblers for the powerpc architecture should not assume the existence of mnemonics not described in this document. for a complete list of simpli?d mnemonics, see appendix f, ?impli?d mnemonics,?in the programming environments manual . 2.3.8 implementation-speci? instructions this section provides a detailed look at the two 603e implementation-speci? instructions tlbld and tlbli . table 2-42. translation lookaside buffer management instructions name mnemonic operand syntax t lb invalidate entry tlbie r b tlb synchronize tlbsync load data tlb entry tlbld r b load instruction tlb entry tlbli r b
motorola chapter 2. programming model 2-47 tlbld tlbld load data tlb entry integer unit tlbld r b ea ? ( r b) tlb entry created from dcmp and rpa dtlb entry selected by ea[15-19] and srr1[way] ? created tlb entry the ea is the contents of r b. the tlbld instruction loads the contents of the data pte compare (dcmp) and required physical address (rpa) registers into the ?st word of the selected data tlb entry. the speci? dtlb entry to be loaded is selected by the ea and the srr1[way] bit. the tlbld instruction should only be executed when address translation is disabled (msr[ir] = 0 and msr[dr] = 0). note that it is possible to execute the tlbld instruction when address translation is enabled; however, extreme caution should be used in doing so. if data address translation is set (msr[dr] = 1) tlbld must be preceded by a sync instruction and succeeded by a context synchronizing instruction. note also that care should be taken to avoid modi?ation of the instruction tlb entries that translate current instruction prefetch addresses. this is a supervisor-level instruction; it is also a 603e-speci? instruction, and not part of the powerpc instruction set. other registers altered: none 05610111516 2021 3031 reserved 31 0 0 0 0 0 0 0 0 0 0 b 978 0
2-48 mpc603e & EC603E risc microprocessors user's manual motorola tlbli tlbli load instruction tlb entry integer unit tlbld r b ea ? ( r b) tlb entry created from icmp and rpa itlb entry selected by ea[15-19] and srr1[way] ? created tlb entry the ea is the contents of r b. the tlbli instruction loads the contents of the instruction pte compare (icmp) and required physical address (rpa) registers into the ?st word of the selected instruction tlb entry. the speci? itlb entry to be loaded is selected by the ea and the srr1[way] bit. the tlbli instruction should only be executed when address translation is disabled (msr[ir] = 0 and msr[dr] = 0). note that it is possible to execute the tlbld instruction when address translation is enabled; however, extreme caution should be used in doing so. if instruction address translation is set (msr[ir] = 1), tlbli must be followed by a context synchronizing instruction such as isync or r . note also that care should be taken to avoid modi?ation of the instruction tlb entries that translate current instruction prefetch addresses. this is a supervisor-level instruction; it is also a 603e-speci? instruction, and not part of the powerpc instruction set. other registers altered: none 05610111516 2021 3031 reserved 31 0 0 0 0 0 0 0 0 0 0 b 1010 0
motorola chapter 3. instruction and data cache operation 3-1 chapter 3 instruction and data cache operation 30 30 the powerpc 603e microprocessor provides two 16-kbyte, four-way set associative caches to allow the registers and execution units rapid access to instructions and data. both the instruction and data caches are tightly coupled to the 603es bus interface unit (biu) to allow ef?ient access to the system memory controller and other bus masters. the 603es load/store unit (lsu) is also directly coupled to the data cache to allow the ef?ient movement of data to and from the general-purpose and ?ating-point registers. (the ?ating-point register ?e is not supported on the EC603E microprocessor.) both the instruction and data caches have a block size of 32 bytes, and the data cache blocks can be snooped, or cast-out when the cache block is reloaded. the data cache is designed to adhere to a write-back policy, but the 603e allows control of cacheability, write-back policy, and memory coherency at the page and block level. both caches use a least recently used (lru) replacement policy. burst ?l operations to the caches result from cache misses, or in the case of the data cache, cache block write-back operations to memory. note that in the powerpc architecture, the term ?ache block? or simply ?lock when used in the context of cache implementations, refers to the unit of memory at which coherency is maintained. for the 603e, the block size is equivalent to the eight-word cache line. this value may be different for other powerpc implementations. the data cache is con?ured as 128 sets of four blocks. each block consists of 32 bytes, two state bits, and an address tag. the two state bits implement the three-state mei (modi?d/exclusive/invalid) protocol, a coherent subset of the standard four-state mesi protocol. cache coherency is enforced by on-chip bus snooping logic. since the 603es data cache tags are single-ported, a simultaneous load or store and snoop access represent a resource contention. the snoop access is given ?st access to the tags. load or store operations can be performed to the cache on the clock cycle immediately following a snoop access if the snoop misses; snoop hits may block the data cache for two or more cycles, depending on whether a copyback to main memory is required. the instruction cache also consists of 128 sets of four blocks, and each block consists of 32 bytes, an address tag, and a valid bit. the instruction cache is only written as a result of a block fill operation on a cache miss. in the pid7v-603e, the instruction cache is blocked only until the critical load completes. the pid7v-603e supports instruction fetching from other instruction cache lines following the forwarding of the critical ?st double word of a cache line load operation. successive instruction fetches from the cache line being loaded
3-2 mpc603e & EC603E risc microprocessors user's manual motorola are forwarded, and accesses to other instruction cache lines can proceed during the cache line load operation. the instruction cache is not snooped, and cache coherency must be maintained by software. a fast hardware invalidation capability is provided to support cache maintenance. the load/store unit provides the data transfer interface between the data cache and the gprs and the fprs (not supported by the EC603E microprocessor). the load/store unit provides all logic required to calculate effective addresses, handle data alignment to and from the data cache, and provides sequencing for load and store string and multiple operations. as shown in figure 1-1, the caches provide a 64-bit interface to the instruction fetcher and load/store unit. write operations to the data cache can be performed on a byte, half-word, word, or double-word basis. the 603es bus interface unit receives requests for bus operations from the instruction and data caches, and executes the operations according to the 603e bus protocol. the biu provides address queues, prioritization and bus control logic. the biu also captures snoop addresses for data cache, address queue, and memory reservation ( lwarx and stwcx . instruction) operations. the biu also contains a touch load address buffer used for address compares during load or store operations. all the data for the corresponding address queues (load and store data queues) is located in the data cache. the data queues are considered temporary storage for the cache and not part of the biu. on a cache miss, the 603es cache blocks are loaded in four beats of 64 bits each when the 603e is con?ured with a 64-bit data bus; when the 603e is con?ured with a 32-bit bus, cache block loads are performed with eight beats of 32 bits each. the burst load is performed as critical double word ?st. the data cache is blocked to internal accesses until the load completes; the instruction cache allows sequential fetching during a cache block load. in the pid7v-603e, the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. note that the cache being ?led cannot be accessed internally until the ?l completes. when address translation is enabled, the memory access is performed under the control of the page table entry used to translate the effective address. each page table entry contains four mode control bits, w, i, m, and g, that specify the storage mode for all accesses translated using that particular page table entry. the w (write-through) and i (caching- inhibited) bits control how the processor executing the access uses its own cache. the m (memory coherence) bit speci?s whether the processor executing the access must use the mei (modi?d, exclusive, or invalid) cache coherence protocol to ensure all copies of the addressed memory location are kept consistent. the g (guarded memory) bit controls whether out-of-order data and instruction fetching is permitted. the 603e maintains data cache coherency in hardware by coordinating activity between the data cache, the memory system, and the bus interface logic. as bus operations are performed on the bus by other bus masters, the 603e bus snooping logic monitors the addresses that are referenced. these addresses are compared with the addresses resident in the data cache. if there is a snoop hit, the 603es bus snooping logic responds to the bus
motorola chapter 3. instruction and data cache operation 3-3 interface with the appropriate snoop status (for example, an ar tr y ). additional snoop action may be forwarded to the cache as a result of a snoop hit in some cases (a cache push of modi?d data, or a cache block invalidation). the 603e supports a fully-coherent 4-gbyte physical memory address space. bus snooping is used to drive the mei three-state cache-coherency protocol that ensures the coherency of global memory with respect to the processors cache. the mei protocol is described in section 3.6.1, ?ei state de?itions.? this chapter describes the organization of the 603es on-chip instruction and data caches, the mei cache coherency protocol, cache control instructions, various cache operations, and the interaction between the cache, load/store unit, and the bus interface unit. pid7v- 603e speci? information is noted where applicable. 3.1 instruction cache organization and control the instruction fetcher accesses the instruction cache frequently in order to sustain the high throughput provided by the six-entry instruction dispatch queue. 3.1.1 instruction cache organization the organization of the instruction cache is shown in figure 3-1. each cache block contains eight contiguous words from memory that are loaded from an 8-word boundary (that is, bits a27?31 of the effective addresses are zero); thus, a cache block never crosses a page boundary. misaligned accesses across a page boundary can incur a performance penalty note that address bits a20?26 provide an index to select a set. bits a27?31 select a byte within a block. the tags consists of bits pa0?a19. address translation occurs in parallel, such that higher-order bits (the tag bits in the cache) are physical. note that the replacement algorithm is strictly an lru algorithm; that is, the least recently used block is ?led with new instructions on a cache miss. figure 3-1. instruction cache organization address tag 1 address tag 2 address tag 3 block 1 block 2 block 3 128 sets address tag 0 block 0 8 words/block state state state words 0? words 0? words 0? words 0? state
3-4 mpc603e & EC603E risc microprocessors user's manual motorola 3.1.2 instruction cache fill operations the 603es instruction cache blocks are loaded in four beats of 64 bits each, with the critical double word loaded ?st. the instruction cache allows sequential fetching during a cache block load. on a cache miss, the critical and following double words read from memory are simultaneously written to the instruction cache and forwarded to the dispatch queue, thus minimizing stalls due to cache ?l latency. there is no snooping of the instruction cache. in the pid7v-603e, the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. 3.1.3 instruction cache control in addition to instruction cache control instructions, the 603e provides several control bits in the hid0 register for the control of invalidating, disabling, and locking the instruction cache. in addition, the wimg bits in the page tables also affect the cacheability of pages and whether or not the pages are considered guarded. 3.1.3.1 instruction cache invalidation while the 603es instruction cache is automatically invalidated during a power-on or hard reset, assertion of the soft reset signal does not cause instruction cache invalidation. software may invalidate the contents of the instruction cache using the instruction cache ?sh invalidate (icfi) control bit in the hid0 register. flash invalidation of the instruction cache is accomplished by setting and clearing the icfi bit with two consecutive move to spr operations to the hid0 register. 3.1.3.2 instruction cache disabling the instruction cache may be disabled through the use of the instruction cache enable (ice) control bit in the hid0 register. when the instruction cache is in the disabled state, the cache tag state bits are ignored, and all accesses are propagated to the bus as single-beat transactions. the ice bit is cleared during a power-on reset, causing the instruction cache to be disabled. the setting of the ice bit must be preceded by an isync instruction to prevent the cache from being enabled or disabled while an instruction access is in progress. 3.1.3.3 instruction cache locking the contents of instruction cache may be locked through the use of the ilock control bit in the hid0 register. a locked instruction cache supplies instructions normally on a cache hit, but cache misses are treated as cache-inhibited accesses. the cache inhibited (ci ) signal is asserted if a cache access misses into a locked cache. the setting of the ilock bit in hid0 must be preceded by an isync instruction to prevent the instruction cache from being locked during an instruction access.
motorola chapter 3. instruction and data cache operation 3-5 3.2 data cache organization and control the data cache supplies data to the gprs and fprs (not supported on the EC603E microprocessor) by means of the load/store unit, and provides buffers for load and store bus operations. the data cache also provides storage for the cache tags required for memory coherency and performs the cache block replacement lru function. 3.2.1 data cache organization the organization of the data cache is shown in figure 3-2. each cache block contains eight contiguous words from memory that are loaded from an 8-word boundary (that is, bits a27?31 of the effective addresses are zero); thus, a cache block never crosses a page boundary. misaligned accesses across a page boundary can incur a performance penalty. note that address bits a20?26 provide an index to select a set. bits a27?31 select a byte within a block. the tags consists of bits pa0?a19. address translation occurs in parallel, such that higher-order bits (the tag bits in the cache) are physical. note that the replacement algorithm is strictly an lru algorithm; that is, the least recently used block is ?led with new data on a cache miss. figure 3-2. data cache organization 3.2.2 data cache fill operations the 603es cache blocks are loaded in four beats of 64 bits each when the 603e is con?ured with a 64-bit data bus; when the 603e is con?ured with a 32-bit bus, cache block loads are performed with eight beats of 32 bits each. the burst load is performed as critical double word ?st. the data cache is blocked to internal accesses until the load completes. in the pid7v-603e, the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays. address tag 1 address tag 2 address tag 3 block 1 block 2 block 3 128 sets address tag 0 block 0 8 words/block state state state words 0? words 0? words 0? words 0? state
3-6 mpc603e & EC603E risc microprocessors user's manual motorola 3.2.3 data cache control the 603e provides several means of data cache control through the use of the wimg bits in the page tables, control bits in the hid0 register, and user- and supervisor-level cache control instructions. while memory page level cache control is provided by the wimg bits, the on-chip data cache can be invalidated, disabled, locked, or broadcast by the control bits in the hid0 register described in this section. (note that, user- and supervisor-level are referred to as problem and privileged state, respectively, in the architecture speci?ation.) 3.2.3.1 data cache invalidation while the data cache is automatically invalidated when the 603e is powered up and during a hard reset, assertion of the soft reset signal does not cause data cache invalidation. software may invalidate the contents of the data cache using the data cache ?sh invalidate (dcfi) control bit in the hid0 register. flash invalidation of the data cache is accomplished by setting and clearing the dcfi bit in two consecutive store operations. 3.2.3.2 data cache disabling the data cache may be disabled through the use of the data cache enable (dce) control bit in the hid0 register. when the data cache is in the disabled state, the cache tag state bits are ignored, and all accesses are propagated to the bus as single-beat transactions. the dce bit is cleared on power-up, causing the data cache to be disabled. the setting of the dce bit must be preceded by a sync instruction to prevent the cache from being enabled or disabled in the middle of a data access. note that while snooping is not performed when the data cache is disabled, cache operations (caused by the dcbz , dcbf , dcbst , and dcbi instructions) are not affected by disabling the cache, causing potential coherency errors. an example of this would be a dcbf instruction that hits a modi?d cache block in the disabled cache, causing a copyback to memory of potentially stale data. regardless of the state of hid0[dce], load and store operations are assumed to be weakly ordered. thus the lsu can perform load operations that occur later in the program ahead of store operations, even when the data cache is disabled. however, strongly ordered load and store operations can be enforced through the setting of the i bit (of the page wimg bits) when address translation is enabled. note that when address translation is disabled, the default wimg bits cause the i bit to be cleared (accesses are assumed to be cacheable), and thus the accesses are weakly ordered. refer to section 3.5.2, ?aching-inhibited attribute (i),?for a description of the operation of the i bit and section 5.2, ?eal addressing mode, for a description of the wimg bits when address translation is disabled. 3.2.3.3 data cache locking the contents of the data cache may be locked through the use of the dlock control bit in the hid0 register. a locked data cache supplies data normally on a cache hit, but cache misses are treated as cache-inhibited accesses. the cache inhibited (ci ) signal is asserted if
motorola chapter 3. instruction and data cache operation 3-7 a cache access misses into a locked cache. the setting of the dlock bit in hid0 must be preceded by a sync instruction to prevent the data cache from being locked during a data access. 3.2.3.4 data cache operations and address broadcasts the execution of a dcbz instruction results in an address-only broadcast on the bus. additionally, if the hid0[abe] bit is set on a pid7v-603e processor, the execution of the dcbf , dcbi , and dcbst instructions will also cause an address-only broadcast. the ability of the pid7v-603e to optionally perform address-only broadcasts when executing the dcbi , dcbf , and the dcbst instructions allows the coherency management of an external copyback l2 cache. note that these cache control instruction broadcasts are not snooped by the pid7v-603e. 3.2.4 data cache touch load support touch load operations allow an instruction stream to prefetch data from memory prior to a cache miss. the 603e supports touch load operations through a temporary cache block buffer located between the biu and the data cache. the cache block buffer is essentially a ?ating cache block that is loaded by the biu on a touch load operation, and is then read by a load instruction that requests that data. after a touch load completes on the bus, the biu continues to compare the touch load address with subsequent load requests from the data cache. if the load address matches the touch load address in the biu, the data is forwarded to the data cache from the touch load buffer, the read from memory is canceled, and the touch load address buffer is invalidated. to avoid the storage of stale data in the touch load buffer, touch load requests that are mapped as write-through or caching-inhibited by the mmu are treated as no-ops by the biu. also, subsequent load instructions after a touch load that are mapped as write-through or caching-inhibited do not hit in the touch load buffer, and cause the touch load buffer to be invalidated on a matching address. while the 603e provides only a single cache block buffer, other powerpc microprocessor implementations may provide buffering for more than one cache block. programs written for other implementations may issue several dcbt or dcbtst instructions sequentially, reducing the performance if executed on the 603e. to improve performance in these situations, the noopti bit (bit 31) in the hid0 register may be set. this causes the dcbt and dcbtst instructions to be treated as no-ops, cause no bus activity, and incur only one processor clock cycle of execution latency. the default state of the noopti bit is cleared after a power-on reset operation, enabling the use of the dcbt and dcbtst instructions.
3-8 mpc603e & EC603E risc microprocessors user's manual motorola 3.3 basic data cache operations this section describes the three types of operations that can occur to the data cache, and how these operations are implemented in the 603e. 3.3.1 data cache fill a cache block is ?led after a read miss or write miss (read-with-intent-to-modify) occurs in the cache. the cache block that corresponds to the missed address is updated by a burst transfer of the data from system memory. note that if a read miss occurs in a system with multiple bus masters, and the data is modi?d in another cache, the modi?d data is ?st written to external memory before the cache ?l occurs. 3.3.2 data cache cast-out operation the 603e uses an lru replacement algorithm to determine which of the four possible cache locations should be used for a cache update on a cache miss. adding a new block to the cache causes any modi?d data associated with the least recently used element to be written back, or cast out, to system memory to maintain memory coherence. 3.3.3 cache block push operation when a cache block in the 603e is snooped and hit by another bus master and the data is modi?d, the cache block must be written to memory and made available to the snooping device. the cache block that is hit is said to be pushed out onto the bus. the 603e supports two kinds of push operations?ormal push operations and enveloped high-priority push operations, which are described in section 3.6.9, ?nveloped high-priority cache block push operation. 3.4 data cache transactions on bus the 603e transfers data to and from the data cache in single-beat transactions of two words, or in four-beat transactions of eight words which ?l a cache block. 3.4.1 single-beat transactions single-beat bus transactions can transfer from one to eight bytes to or from the 603e. single-beat transactions can be caused by cache write-through accesses, caching-inhibited accesses (i bit of the wimg bits for the page is set), or accesses when the cache is disabled (hid0[dce] bit is cleared), and can be misaligned. 3.4.2 burst transactions burst transactions on the 603e always transfer eight words of data at a time, and are aligned to a double-word boundary. the 603e transfer burst (tbs t ) output signal indicates to the system whether the current transaction is a single-beat transaction or four-beat burst transfer. burst transactions have an assumed address order. for cacheable read operations
motorola chapter 3. instruction and data cache operation 3-9 or cacheable, non-write-through write operations that miss the cache, the 603e presents the double-word aligned address associated with the load or store instruction that initiated the transaction. as shown in figure 3-3, this quad word contains the address of the load or store that missed the cache. this minimizes latency by allowing the critical code or data to be forwarded to the processor before the rest of the block is ?led. for all other burst operations, however, the entire block is transferred in order (oct-word aligned). critical-double-word-first fetching on a cache miss applies to both the data and instruction cache. 3.4.3 access to direct-store segments the 603e does not provide support for access to direct-store segments. operations attempting to access a direct-store segment will invoke a dsi exception. for additional information about dsi exceptions, refer to section 4.5.3, "dsi exception (0x00300). figure 3-3. double-word address ordering?ritical double word first 603e cache address bits (27?8) beat beat abcd 0 0 0 1 1 0 1 1 abcd 0123 if the address requested is in double word a, the address placed on the bus is that of double- word a, and the four data beats are ordered in the following manner: if the address requested is in double word c, the address placed on the bus will be that of double-word c, and the four data beats are ordered in the following manner: cd ab 0123
3-10 mpc603e & EC603E risc microprocessors user's manual motorola 3.5 memory management/cache access mode bits w, i, m, and g some memory characteristics can be set on either a block or page basis by using the wimg bits in the bat registers or page table entry (pte) respectively. the wimg attributes control the following functionality: write-through (w bit) caching-inhibited (i bit) memory coherency (m bit) guarded memory (g bit) these bits allow both uniprocessor and multiprocessor system designs to exploit numerous system-level performance optimizations. careless speci?ation and use of these bits may create situations where coherency paradoxes are observed by the processor. in particular, this can happen when the state of these bits is changed without appropriate precautions being taken (for example, when ?shing the pages that correspond to the changed bits from the caches of all processors in the system is required, or when the address translations of aliased physical addresses (referred to as real addresses in the architecture speci?ation) specify different values for any of the wim bits). the 603e considers either of these cases to be a programming error which may compromise the coherency of memory. these paradoxes can occur within a single processor or across several devices, as described in section 3.6.4.1, ?oherency in single-processor systems. the wimg attributes are programmed by the operating system for each page and block. the w and i attributes control how the processor performing an access uses its own cache. the m attribute ensures that coherency is maintained for all copies of the addressed memory location. the g attribute prevents out-of-order loading and prefetching from the addressed memory location. when an access requires coherency, the processor performing the access must inform the coherency mechanisms throughout the system that the access requires memory coherency. the m attribute determines the kind of access performed on the bus (global or local). the wimg attributes occupy four bits in the bat registers for block address translation and in the ptes for page address translation. the wimg bits are programmed as follows: the operating system uses the mtspr instruction to program the wimg bits in the bat registers for block address translation. the ibat register pairs do not have a g bit and all accesses that use the ibat register pairs are considered not guarded. the operating system writes the wimg bits for each page into the ptes in system memory as it sets up the page tables.
motorola chapter 3. instruction and data cache operation 3-11 note that for accesses performed with direct address translation (msr[ir] = 0 or msr[dr] = 0 for instruction or data access, respectively), the wimg bits are automatically generated as 0b0011 (the data is write-back, caching is enabled, memory coherency is enforced, and memory is guarded). 3.5.1 write-through attribute (w) when an access is designated as write-through (w = 1), if the data is in the cache, a store operation updates the cached copy of the data. in addition, the update is written to the external memory location (as described below). while the powerpc architecture permits multiple store instructions to be combined for write-through accesses except when the store instructions are separated by a sync or eieio instruction, the 603e does not implement this ?ombined store?capability. note that a store operation that uses the write-through attribute may cause any part of valid data in the cache to be written back to main memory. the de?ition of the external memory location to be written to in addition to the on-chip cache depends on the implementation of the memory system but can be illustrated by the following examples: ram?he store is sent to the ram controller to be written into the target ram. i/o device?he store is sent to the memory-mapped i/o control hardware to be written to the target register or memory location. in systems with multilevel caching, the store must be written to at least a depth in the memory hierarchy that is seen by all processors and devices. accesses that correspond to w = 0 are considered write-back. for this case, although the store operation is performed to the cache, it is only made to external memory when a copy- back operation is required. use of the write-back mode (w = 0) can improve overall performance for areas of the memory space that are seldom referenced by other masters in the system. 3.5.2 caching-inhibited attribute (i) if i = 1, the memory access is completed by referencing the location in main memory, bypassing the on-chip cache. during the access, the addressed location is not loaded into the cache nor is the location allocated in the cache. it is considered a programming error if a copy of the target location of an access to caching-inhibited memory is resident in the cache. software must ensure that the location has not been previously loaded into the cache, or, if it has, that it has been ?shed from the cache. the powerpc architecture permits data accesses from more than one instruction to be combined for cache-inhibited operations, except when the accesses are separated by a sync instruction, or by an eieio instruction when the page or block is also designated as guarded.
3-12 mpc603e & EC603E risc microprocessors user's manual motorola this ?ombined access?capability is not implemented on the 603e. note that the eieio is treated as a no-op by the 603e. the caching-inhibited (i) bit in the 603e controls whether load and store operations are strongly or weakly ordered. if an i/o device requires load and store accesses to occur in program order, then the i bit for the page must be set. 3.5.3 memory coherency attribute (m) this attribute is provided to allow improved performance in systems where hardware- enforced coherency is relatively slow, and software is able to enforce the required coherency. when m = 0, the processor does not enforce data coherency. when m = 1, the processor enforces data coherency and the corresponding access is considered to be a global access. when the m attribute is set, and the access is performed, the global signal is asserted to indicate that the access is global. snooping devices affected by the access must then respond to this global access if their data is modi?d by asserting ar tr y , and updating the memory location. because instruction memory does not have to be consistent with data memory, the 603e ignores the m attribute for instruction accesses. 3.5.4 guarded attribute (g) when the guarded bit is set, the memory area (block or page) is designated as guarded, meaning that the processor will perform out-of-order accesses to this area of memory, only as follows: out-of-order load operations from guarded memory areas are performed only if the corresponding data is resident in the cache. the processor prefetches from guarded areas, but only when required, and only within the memory boundary dictated by the cache block. that is, if an instruction is certain to be required for execution by the program, it is fetched and the remaining instructions in the block may be prefetched, even if the area is guarded. this setting can be used to protect certain memory areas from read accesses made by the processor that are not dictated directly by the program. if there are areas of memory that are not fully populated (in other words, there are holes in the memory map within this area), this setting can protect the system from undesired accesses caused by out-of-order load operations or instruction prefetches that could lead to the generation of the machine check exception. also, the guarded bit can be used to prevent out-of-order load operations or prefetches from occurring to certain peripheral devices that produce undesired results when accessed in this way.
motorola chapter 3. instruction and data cache operation 3-13 3.5.5 w, i, and m bit combinations table 3-1 summarizes the six combinations of the wim bits. note that either a zero or one setting for the g bit is allowed for each of these wim bit combinations. 3.5.5.1 out-of-order execution and guarded memory out-of-order execution occurs when the 603e performs operations in advance in case the result is needed. typically, these operations are performed by otherwise idle resources; thus if a result is not required, it is ignored and the out-of-order operation incurs no time penalty (typically). supervisor-level programs designate memory as guarded on a block or page level. memory is designated as guarded if it may not be ?ell-behaved?with respect to out-of-order operations. for example, the memory area that contains a memory-mapped i/o device may be designated as guarded if an out-of-order load or instruction fetch performed to such a device might cause the device to perform unexpected or incorrect operations. another example of memory that should be designated as guarded is the area that corresponds to the device that resides at the highest implemented physical address (as it has no successor and out-of-order sequential operations such as instruction prefetching may result in a machine table 3-1. combinations of w, i, and m bits wim setting meaning 000 data may be cached. loads or stores whose target hits in the cache use that entry in the cache. memory coherency is not enforced by hardware. 001 data may be cached. loads or stores whose target hits in the cache use that entry in the cache. memory coherency is enforced by hardware. 010 caching is inhibited. the access is performed to external memory, completely bypassing the cache. memory coherency is not enforced by hardware. 011 caching is inhibited. the access is performed to external memory, completely bypassing the cache. memory coherency must be enforced by external hardware (processor provides hardware indication that access is global). 100 data may be cached. load operations whose target hits in the cache use that entry in the cache. stores are written to external memory. the target location of the store may be cached and is updated on a hit. memory coherency is not enforced by hardware. 101 data may be cached. load operations whose target hits in the cache use that entry in the cache. stores are written to external memory. the target location of the store may be cached and is updated on a hit. memory coherency is enforced by hardware.
3-14 mpc603e & EC603E risc microprocessors user's manual motorola check exception). in addition, areas that contain holes in the physical memory space may be designated as guarded. 3.5.5.2 effects of out-of-order data accesses most data operations may be performed out-of-order, as long as the machine appears to follow a simple sequential model. however, the following out-of-order operations do not occur: out-of-order loading from guarded memory (g = 1) does not occur. however, when a load or store operation is required by the program, the entire cache block(s) containing the referenced data may be loaded into the cache. out-of-order store operations that alter the state of the target location do not occur. no errors except machine check exceptions are reported due to the out-of-order execution of an instruction until it is known that execution of the instruction is required. machine check exceptions resulting solely from out-of-order execution (from nonguarded memory) may be reported. when an out-of-order instruction's result is abandoned, only one side effect (other than a possible machine check) may occur?he referenced bit (r) in the corresponding page table entry (and tlb entry) can be set due to an out-of-order load operation. see chapter 4, ?xceptions,?for more information on the machine check exception. thus an out-of-order load or store instruction will not access guarded memory unless one of the following conditions exist: the target memory item is resident in an on-chip cache. in this case, the location may be accessed from the cache or main memory. the target memory item is cacheable (i = 0) and it is guaranteed that the load or store is in the execution path (assuming there are no intervening exceptions). in this case, the entire cache block containing the target may be loaded into the cache. the target memory is cache-inhibited (i = 1), the load or store instruction is in the execution path, and it is guaranteed that no prior instructions can cause an exception. 3.5.5.3 effects of out-of-order instruction fetches to avoid instruction fetch delay, the processor typically fetches instructions ahead of those currently being executed. such instruction prefetching is said to be out-of-order in that prefetched instructions may not be executed due to intervening branches or exceptions. during instruction prefetching, no errors except machine check exceptions are reported due to the out-of-order fetching of an instruction until it is known that execution of the instruction is required. machine check exceptions resulting solely from out-of-order execution (from nonguarded memory) may be reported. when an out-of-order instruction's result is abandoned, only one side effect (other than a possible machine check) may occur?he referenced bit (r) in the
motorola chapter 3. instruction and data cache operation 3-15 corresponding page table entry (and tlb entry) can be set due to an out-of-order load operation. see chapter 4, ?xceptions,?for more information on the machine check exception. instruction fetching from guarded memory is not permitted. 3.6 cache coherency?ei protocol the primary objective of a coherent memory system is to provide the same image of memory to all devices using the system. coherency allows synchronization and cooperative use of shared resources. otherwise, multiple copies of a memory location, some containing stale values, could exist in a system resulting in errors when the stale values are used. each potential bus master must follow rules for managing the state of its cache. the 603e cache coherency protocol is a coherent subset of the standard mesi four-state cache protocol that omits the shared state. since data cannot be shared, the 603e signals all cache block ?ls as if they were write misses (read-with-intent-to-modify), which ?shes the corresponding copies of the data in all caches external to the 603e prior to the 603es cache block ?l operation. following the cache block load, the 603e is the exclusive owner of the data and may write to it without a bus broadcast transaction. to maintain this coherency, all global reads observed on the bus by the 603e are snooped as if they were writes, causing the 603e to write a modi?d cache block back to memory and invalidate the cache block, or simply invalidate the cache block if it is unmodi?d. the exception to this rule occurs when a snooped transaction is a caching-inhibited read (either burst or single-beat, where tt[0?] = x1010; see table 7-1 for clari?ation), in which case the 603e does not invalidate the snooped cache block. if the cache block is modi?d, the block is written back to memory, and the cache block is marked exclusive unmodi?d. if the cache block is marked exclusive unmodi?d when snooped, no bus action is taken, and the cache block remains in the exclusive unmodi?d state. this treatment of caching- inhibited reads decreases the possibility of data thrashing by allowing noncaching devices to read data without invalidating the entry from the 603es data cache. 3.6.1 mei state de?itions the 603es data cache characterizes each 32-byte block it contains as being in one of three mei states. addresses presented to the cache are indexed into the cache directory with bits a20?26, and the upper-order 20 bits from the physical address translation (pa0?a19) are compared against the indexed cache directory tags. if neither of the indexed tags matches, the result is a cache miss. if a tag matches, a cache hit occurred and the directory indicates the state of the cache block through two state bits kept with the tag. the three possible states for a cache block in the cache are the modi?d state (m), the exclusive state (e), and the invalid state (i). the three mei states are de?ed in table 3-2.
3-16 mpc603e & EC603E risc microprocessors user's manual motorola 3.6.2 mei state diagram the 603e provides dedicated hardware to provide memory coherency by snooping bus transactions. the address retry capability of the 603e enforces the mei protocol, as shown in figure 3-4. figure 3-4 assumes that the wim bits for the page or block are set to 001; that is, write-back, caching-not-inhibited, and memory coherency enforced. section 3.10, ?ei state transactions,?provides a detailed list of mei transitions for various operations and wim bit settings. figure 3-4. mei cache coherency protocol?tate diagram (wim = 001) table 3-2. mei state definitions mei state de?ition modi?d (m) the addressed cache block is valid in the cache and only in the cache. the cache block is modi?d with respect to system memory?hat is, the modi?d data in the cache block has not been written back to memory. exclusive (e) the addressed block is in this cache only. the data in this cache block is consistent with system memory. invalid (i) this state indicates that the addressed cache block is not resident in the cache. rh wh rh modified wh sh sh/cir sh/crw wm exclusive invalid sh/crw rm bus transactions sh = snoop hit rh = read hit rm = read miss wh = write hit wm = write miss sh/crw = snoop hit, cacheable read/write sh/cir = snoop hit, cache inhibited read
motorola chapter 3. instruction and data cache operation 3-17 3.6.3 mei hardware considerations while the 603e provides the hardware required to monitor bus traf? for coherency, the 603e data cache tags are single ported, and a simultaneous load or store and snoop access represent a resource con?ct. in general, the snoop access has highest priority and is given ?st access to the tags. the load or store access will then occur on the clock following the snoop. the snoop is not given priority into the tags when the snoop coincides with a tag write (for example, validation after a cache block load). in these situations, the snoop is retried and must re-arbitrate before the lookup is possible. occasionally, cache snoops cannot be serviced and must be retried. these retries occur if the cache is busy with a burst read or write when the snoop operation takes place. note that it is possible for a snoop to hit a modi?d cache block that is already in the process of being written to the copyback buffer for replacement purposes. if this happens, the 603e retries the snoop, and raises the priority of the cast-out operation to allow it to go to the bus before the cache block ?l. the global (gbl ) signal, asserted as part of the address attribute ?ld during a bus transaction, enables the snooping hardware of the 603e. address bus masters assert gbl to indicate that the current transaction is a global access (that is, an access to memory shared by more than one device). if gbl is not asserted for the transaction, that transaction is not snooped by the 603e. note that the gbl signal is not asserted for instruction fetches, and that gbl is asserted for all data read or write operations when using direct address translation. (note that direct address translation is referred to as the real addressing mode, not the direct-store segment, in the architecture speci?ation.) normally, gbl re?cts the m-bit value speci?d for the memory reference in the corresponding translation descriptor(s). care must be taken to minimize the number of pages marked as global, because the retry protocol enforces coherency and can use considerable bus bandwidth if much data is shared. therefore, available bus bandwidth can decrease as more traf? is marked global. the 603e snoops a transaction if the transfer start (ts ) and gb l signals are asserted together in the same bus clock (this is a quali?d snooping condition). no snoop update to the 603e cache occurs if the snooped transaction is not marked global. also, because cache block cast-outs and snoop pushes do not require snooping, the gbl signal is not asserted for these operations. when the 603e detects a quali?d snoop condition, the address associated with the ts signal is compared with the cache tags. snooping ?ishes if no hit is detected. if, however, the address hits in the cache, the 603e reacts according to the mei protocol shown in figure 3-4. to facilitate external monitoring of the internal cache tags, the cache set entry signals (cse[0?]) represent in binary the cache set being replaced on read operations (including read-with-intent-to-modify operations). the cse[0?] signals do not apply for write
3-18 mpc603e & EC603E risc microprocessors user's manual motorola operations to memory, or during non-cacheable or touch load operations. note that these signals are valid only for 603e burst operations. table 3-3 shows the cse[0?] (cache set entry) encodings. 3.6.4 coherency precautions the 603e supports a three-state coherency protocol that supports the modi?d, exclusive, and invalid (mei) cache states. this protocol is a compatible subset of the mesi four-state protocol and operates coherently in systems that contain four-state caches. in addition, the 603e does not broadcast cache operations caused by cache instructions. they are intended for the management of the local cache but not for other caches in the system. 3.6.4.1 coherency in single-processor systems the following situations concerning coherency can be encountered within a single- processor system: load or store to a caching-inhibited page (wim = 0bx1x) and a cache hit occurs caching is inhibited for this page (i = 1)?oad or store operations to a caching- inhibited page that hit in the cache cause boundedly unde?ed results. store to a page marked write-through (wim = 0b10x) and a cache read hit to a modi?d cache block this page is marked as write-through (w = 1)?he 603e pushes the modi?d cache block to memory and the block remains marked modi?d (m). note that when wim bits are changed, it is critical that the cache contents should re?ct the new wim bit settings. for example, if a block or page that had allowed caching becomes caching-inhibited, software should ensure that the appropriate cache blocks are ?shed to memory and invalidated. 3.6.5 load and store coherency summary table 3-4 provides a summary of memory coherency actions performed by the 603e on load operations. noncacheable cases are not part of this table. table 3-3. cse[0?] signal encoding cse[0?] cache set element 00 set 0 01 set 1 10 set 2 11 set 3
motorola chapter 3. instruction and data cache operation 3-19 table 3-5 provides an overview of memory coherency actions on store operations. this table does not include noncacheable or write-through cases. the read-with-intent-to- modify (rwitm) examples involve selecting a replacement class and casting-out modi?d data that may have resided in that replacement class. 3.6.6 atomic memory references the load word and reserve indexed ( lwarx ) and store word conditional indexed ( stwcx. ) instructions provide an atomic update function for a single, aligned word of memory. while an lwarx instruction will normally be paired with an stwcx. instruction with the same effective address, an stwcx. instruction to any address will cancel the reservation. for detailed information on these instructions, refer to chapter 2, ?rogramming model,?in this book and chapter 8, ?nstruction set,?in the programming environments manual . 3.6.7 cache reaction to speci? bus operations there are several bus transaction types de?ed for the 603e bus. the 603e must snoop these transactions and perform the appropriate action to maintain memory coherency as shown in table 3-6. a processor may assert ar tr y for any bus transaction due to internal con?cts that prevent the appropriate snooping. the transactions in table 3-6 correspond to the transfer type signals tt[0?], which are described in section 7.2.4.1, ?ransfer type (tt[0?]).? table 3-4. memory coherency actions on load operations cache state bus operation ar tr y action m none don? care read from cache e none don? care read from cache i read negated load data and mark e i read asserted retry read operation table 3-5. memory coherency actions on store operations cache state bus operation ar tr y action m none don't care modify cache e none don't care modify cache, mark m i rwitm negated load data, modify it, mark m i rwitm asserted retry the rwitm
3-20 mpc603e & EC603E risc microprocessors user's manual motorola table 3-6. response to bus transactions snooped transaction 603e response clean block no action is taken. flush block no action is taken. write-with-?sh write-with-?sh-atomic write-with-?sh and write-with-?sh-atomic operations occur after the processor issues a store or stwcx. instruction, respectively. if the addressed block is in the exclusive state, the address snoop forces the state of the addressed block to invalid. if the addressed block is in the modi?d state, the address snoop causes a r tr y to be asserted and initiates a push of the modi?d block out of the cache and changes the state of the block to invalid. the execution of an stwcx. instruction cancels the reservation associated with any address. kill block the kill block operation is an address-only bus transaction initiated when a dcbz instruction is executed; when snooped by the 603e, the addressed cache block is invalidated if in the e state, or ?shed to memory and invalidated if in the m state, and any associated reservation is canceled. write-with-kill in a write-with-kill operation, the processor snoops the cache for a copy of the addressed block. if one is found, an additional snoop action is initiated internally and the cache block is forced to the i state, killing modi?d data that may have been in the block. any reservation associated with the block is also canceled. read read-atomic the read operation is used by most single-beat and burst read operations on the bus. all burst reads observed on the bus are snooped as if they were writes, causing the addressed cache block to be ?shed. a read on the bus with the gbl signal asserted causes the following responses: ?if the addressed block in the cache is invalid, the 603e takes no action. ?if the addressed block in the cache is in the exclusive state, the block is invalidated. ?if the addressed block in the cache is in the modi?d state, the block is ?shed to memory and the block is invalidated. ?if the snooped transaction is a caching-inhibited read, and the block in the cache is in the exclusive state, the snoop causes no bus activity and the block remains in the exclusive state. if the block is in the cache in the modi?d state, the 603e initiates a push of the modi?d block out to memory and marks the cache block as exclusive. read atomic operations appear on the bus in response to lwarx instructions and generate the same snooping responses as read operations. read-with-intent-to- modify (rwitm) rwitm-atomic a rwitm operation is issued to acquire exclusive use of a memory location for the purpose of modifying it. if the addressed block is invalid, the 603e takes no action. if the addressed block in the cache is in the exclusive state, the 603e initiates an additional snoop action to change the state of the cache block to invalid. if the addressed block in the cache is in the modi?d state, the block is ?shed to memory and the block is invalidated. the rwitm atomic operations appear on the bus in response to stwcx. instructions and are snooped like rwitm instructions. sync no action is taken. tlb invalidate no action is taken.
motorola chapter 3. instruction and data cache operation 3-21 3.6.8 operations causing ar tr y assertion the following scenarios cause the 603e to assert the ar tr y signal: snoop hits to a block in the m state (?sh or clean) this case is a normal snoop hit and will result in ar tr y being asserted if the snooped transaction was a ush?or ?lean?request. if the snooped transaction was a ?ill?request, a r tr y will not be asserted. snoop attempt during the last t a of a cache line ?l in no-dr tr y mode, during the cycle that the last t a is asserted to the 603e on a cache line ?l, the tag is being written to its new state by the 603e and is not accessible. this will result in a collision being signaled by asserting ar tr y . with dr tr y enabled, the cache tags are inaccessible to a snoop operation one cycle after the last t a . snoop hit after the ?st t a of a burst load operation after the ?st t a of a burst load operation, the data tags are committed to being written; snoop operations cannot be serviced until the load completes, thereby causing the assertion of ar tr y . snoop hits to line in the cast-out buffer the 603e's cast-out buffer is kept coherent with main memory, and snoop operations that hit in the cast-out buffer will cause the assertion of ar tr y . snoop attempt during cycles that dcbz instruction or load or store operation is updating the tag during the execution of a dcbz instruction or during a load or store operation that requires a cache line cast-out, the cache tags will be inaccessible during the ?st and last cycle of the operation. snoop attempt during the cycle that a dcbf or dcbst instruction is updating the tag if the ea of a dcbf or dcbst instruction hits in the cache, the tag will be changed to its new state. during that clock, the tag is not accessible and snoop transactions during that cycle will cause the assertion of ar tr y . 3.6.9 enveloped high-priority cache block push operation in cases where the 603e has completed the address tenure of a read operation, and then detects a snoop hit to a modi?d cache block by another bus master, the 603e provides a high-priority push operation. if the address snooped is the same as the address of the data to be returned by the read operation, ar tr y is asserted one or more times until the data tenure of the read operation is completed. the cache block push transaction can be enveloped within the address and data tenures of a read operation. this feature prevents deadlocks in system organizations that support multiple memory-mapped buses.
3-22 mpc603e & EC603E risc microprocessors user's manual motorola more speci?ally, the 603e internally detects the scenario where a load request is outstanding and the processor has pipelined a write operation on top of the load. normally, when the data bus is granted to the 603e, the resulting data bus tenure is used for the load operation. the enveloped high-priority cache block push feature de?es a bus signal, the data bus write only quali?r (d bw o ), which, when asserted with a quali?d data bus grant, indicates that the resulting data tenure should be used for the store operation instead. this signal is described in section 8.10, ?sing data bus write only.?note that the enveloped copyback operation is an internally pipelined bus operation. 3.7 cache control instructions software must use the appropriate cache management instructions to ensure that caches are kept consistent when data is modi?d by the processor. when a processor alters a memory location that may be contained in an instruction cache, software must ensure that updates to memory are visible to the instruction fetching mechanism. although the instructions to enforce coherency vary among implementations and hence operating systems should provide a system service for this function, the following sequence is typical: 1. dcbst (update memory) 2. sync (wait for update) 3. icbi (invalidate copy in cache) 4. isync (invalidate copy in own instruction buffer) these operations are necessary because the processor does not maintain instruction memory coherent with data memory. software is responsible for enforcing coherency of instruction caches and data memory. since instruction fetching may bypass the data cache, changes made to items in the data cache may not be re?cted in memory until after the instruction fetch completes. the powerpc architecture de?es instructions for controlling both the instruction and data caches when they exist. the 603e interprets the cache control instructions ( icbi , dcbi , dcbt , dcbz , dcbst ) as if they pertain only to the 603es caches. they are not intended for use in managing other caches in the system. the dcbz instruction causes an address-only broadcast on the bus if the contents of the block are from a page marked global through the wimg bits. this broadcast is performed for coherency reasons; the dcbz instruction is the only cache control instruction that can allocate and take new ownership of a line. note that if the hid0[abe] bit is set on a pid7v- 603e processor, the execution of the dcbf , dcbi , and dcbst instructions will also cause an address-only broadcast. the dcbz instruction is also the only cache operation that is snooped by the 603e. the cache instructions are intended primarily for the management of the on-chip cache, and do not perform address-only broadcasts for the maintenance of other caches in the system. the ability of the pid7v-603e to optionally perform address-only broadcasts when executing the dcbi , dcbf , and the dcbst instructions allows the coherency management of an external copyback l2 cache.
motorola chapter 3. instruction and data cache operation 3-23 the other instructions do not broadcast either for the purpose of invalidating or ?shing other caches in the system or for managing system resources. any bus activity caused by these instructions is the direct result of performing the operation in the 603e cache. note that a data access exception is generated if the effective address of a dcbi , dcbst , dcbf , or dcbz instruction cannot be translated due to the lack of a tlb entry. (note that exceptions are referred to as interrupts in the architecture specification.) note that in the powerpc architecture, the term ?ache block? or simply ?lock when used in the context of cache implementations, refers to the unit of memory at which coherency is maintained. for the 603e this is the eight-word cache line. this value may be different for other powerpc implementations. in-depth descriptions of coding these instructions is provided in chapter 3, addressing modes and instruction set summary, and chapter 10, ?nstruction set,?in the programming environments manual . 3.7.1 data cache block invalidate (dcbi) instruction if the block containing the byte addressed by the ea is in the data cache, the cache block is invalidated regardless whether the block is in the exclusive or modi?d state. if hid0[abe] is set on a pid7v-603e when a dcbi instruction is executed, the pid7v-603e will perform an address-only bus transaction. the dcbi instruction can only be executed when the 603e is in the supervisor state. 3.7.2 data cache block touch (dcbt) instruction this instruction provides a method for improving performance through the use of software- initiated prefetch hints. the 603e performs the fetch for the cases when the address hits in the tlb or the bat registers, and when it is a permitted load access from the addressed page. the operation is treated similarly to a byte load operation with respect to coherency. if the address translation does not hit in the tlb or bat mechanism, or if it does not have load access permission, the instruction is treated as a no-op. if the cache is locked or disabled, or if the access is to a page that is marked as guarded, the dcbt instruction is treated as a no-op. if the access is directed to a write-through or caching-inhibited page, the instruction is treated as a no-op. the dcbt instruction never affects the referenced or changed bits in the hashed page table. a successful dcbt instruction affects the state of the tlb and cache lru bits as defined by the lru algorithm. the touch load buffer will be marked invalid if the contents of the touch buffer have been moved to the cache, if any data cache management instruction has been executed, if a dcbz instruction is executed that matches the address of the cache block in the touch buffer, or if another dcbt instruction is executed.
3-24 mpc603e & EC603E risc microprocessors user's manual motorola 3.7.3 data cache block touch for store (dcbtst) instruction the dcbtst instruction, like the data cache block touch instruction ( dcbt ), allows software to prefetch a cache block in anticipation of a store operation (read with intent to modify). 3.7.4 data cache block clear to zero (dcbz) instruction if the block containing the byte addressed by the ea is in the data cache, all bytes are cleared. if the block containing the byte addressed by the ea is not in the data cache and the corresponding page is caching-allowed, the block is established in the data cache without fetching the block from main memory, and all bytes of the block are cleared. if the contents of the cache block are from a page marked global through the wim bits, an address-only bus transaction is run. if the page containing the byte addressed by the ea is caching-inhibited or write-through, then the system alignment exception handler is invoked. the dcbz instruction is treated as a store to the addressed byte with respect to address translation and protection. 3.7.5 data cache block store (dcbst) instruction if the block containing the byte addressed by the ea is in coherency-required mode, and a block containing the byte addressed by the ea is in the data cache of any processor and has been modi?d, the writing of it to main memory is initiated. on a pid7v-603e, if the cache block is unmodi?d, hid0[abe] is set, and if the contents of the cache block are from a page marked global through the wim bits, an address-only bus transaction is run. the function of this instruction is independent of the write-through and caching- inhibited/caching-allowed modes of the block containing the byte addressed by the ea. this instruction is treated as a load to the addressed byte with respect to address translation and protection. 3.7.6 data cache block flush (dcbf) instruction the action taken depends on the memory mode associated with the target, and on the state of the cache block. the list below describes the action taken for the various cases. the actions described are executed regardless of whether the page containing the addressed byte is in caching-inhibited or caching-allowed mode. the following actions occur in both coherency-required mode (wim = 0bxx1) and coherency-not-required mode (wim = 0bxx0).
motorola chapter 3. instruction and data cache operation 3-25 the dcbf instruction causes the following cache activity: unmodi?d block?nvalidates the block in the processors cache. modi?d block?opies the block to memory and invalidates data cache block. absent block?oes nothing. the 603e treats this instruction as a load to the addressed byte with respect to address translation and protection. 3.7.7 enforce in-order execution of i/o instruction (eieio) as de?ed by the powerpc architecture, the eieio instruction provides an ordering function for the effects of load and store instructions executed by a given processor. executing eieio ensures that all memory accesses previously initiated by the given processor are completed with respect to main memory before any memory accesses subsequently initiated by the processor access main memory. the eieio instruction orders loads and stores to caching- inhibited memory only. the eieio instruction is intended for use only in performing memory-mapped i/o operations. it enforces ?trong?ordering of cache-inhibited memory accesses during i/o operations between the processor and i/o devices. when executed by the 603e, the eieio instruction is treated as a no-op; caching-inhibited load and store operations (inhibited by the wimg bits for the page) are performed in strict program order. 3.7.8 instruction cache block invalidate (icbi) instruction the execution of an icbi instruction causes all four cache sets indexed by the ea to be marked invalid. no cache hit is required, and no mmu translation is performed. 3.7.9 instruction synchronize (isync) instruction the isync instruction waits for all previous instructions to complete and then discards any previously fetched instructions, causing subsequent instructions to be fetched (or refetched) from memory and to execute in the context established by the previous instructions. this instruction has no effect on other processors or on their caches. 3.8 bus operations caused by cache control instructions table 3-7 provides an overview of the bus operations initiated by cache control instructions. the cache control, tlb management, and synchronization instructions supported by the 603e may affect or be affected by the operation of the bus. none of the instructions will actively broadcast through address-only transactions on the bus (except for dcbz ), and no broadcasts by other masters are snooped by the 603e (except for kills). the operation of the
3-26 mpc603e & EC603E risc microprocessors user's manual motorola instructions, however, may indirectly cause bus transactions to be performed, or their completion may be linked to the bus. table 3-7 summarizes how these instructions may operate with respect to the bus. note that table 3-7 assumes that the wim bits are set to 001; that is, since the cache is operating in write-back mode, caching is permitted and coherency is enforced. table 3-7 does not include noncacheable or write-through cases, nor does it completely describe the mechanisms for the operations described. for more information, see section 3.10, ?ei state transactions. for detailed information on the cache control instructions, refer to chapter 2, ?rogramming model,?in this book and chapter 8, ?nstruction set,?in the programming environments manual . the 603e contains snooping logic to monitor the bus for these commands and the control logic required to keep the cache and the memory queues coherent. for additional details about the speci? bus operations performed by the 603e, see chapter 8, ?ystem interface operation. table 3-7. bus operations caused by cache control instructions (wim = 001) operation cache state next cache state bus operations comment sync don? care no change none waits for memory queues to complete bus activity icbi don? care i none dcbi don? care i none dcbf i, e i none dcbf m i write with kill block is pushed dcbst i, e no change none dcbst m e write block is pushed dcbz i m write with kill dcbz e, m m kill block writes over modi?d data dcbt i no change read fetched cache block is stored in touch load queue dcbt e, m no change none dcbtst i no change read-with-intent- to-modify fetched cache block is stored in touch load queue dcbtst e,m no change none
motorola chapter 3. instruction and data cache operation 3-27 3.9 bus interface the bus interface buffers bus requests from the instruction and data caches, and executes the requests per the 603e bus protocol. it includes address register queues, prioritization logic, and bus control logic. the bus interface also captures snoop addresses for snooping in the cache and in the address register queues, snoops for reservations, and holds the touch load address for the cache. all data storage for the address register buffers (load and store data buffers) are located in the cache section. the data buffers are considered temporary storage for the cache and not part of the bus interface. the general functions and features of the bus interface are as follows: seven address register buffers that include the following: instruction cache load address buffer data cache load address buffer data cache touch load address buffer (associated data block buffer located in cache) data cache castout/store address buffer (associated data line buffer located in cache) data cache snoop copyback address buffer (associated data line buffer located in cache) reservation address buffer for snoop monitoring pipeline collision detection for data cache buffers reservation address snooping for lwarx / stwcx. instructions one-level address pipelining load ahead of store capability a conceptual block diagram of the bus interface is shown in figure 3-5. the address register queues hold transaction requests that the bus interface may issue on the bus independently of the other requests. the bus interface may have up to two transactions operating on the bus at any given time through the use of address pipelining.
3-28 mpc603e & EC603E risc microprocessors user's manual motorola figure 3-5. bus interface address buffers for additional information about the 603e bus interface and the bus protocols, refer to chapter 8, ?ystem interface operation. 3.10 mei state transactions table 3-8 shows mei state transitions for various operations. bus operations are described in table 3-6. table 3-8. mei state transitions operation cache operation bus sync wim current state next state cache actions bus operation load (t = 0) read no x0x i same 1 cast out of modi?d block (as required) write-with-kill 2 pass four-beat read to memory queue read load (t = 0) read no x0x e,m same read data from cache load (t = 0) read no x1x i same pass single-beat read to memory queue read load (t = 0) read no x1x e i crtry read load (t = 0) read no x1x m i crtry read (push sector to write queue) write-with-kill lwarx read acts like other reads but bus operation uses special encoding snoop biu control addr addr data system bus control i-cache d-cache d-cache cst/st addr d-cache snp addr d-cache tld addr i-cache ld addr d-cache ld addr
motorola chapter 3. instruction and data cache operation 3-29 store (t = 0) write no 00x i same 1 cast out of modi?d block (if necessary) write-with-kill 2 pass rwitm to memory queue rwitm store (t = 0) write no 00x e,m m write data to cache store 1 stwcx. (t = 0) write no 10x i same pass single-beat write to memory queue write-with- flush store 1 stwcx. (t = 0) write no 10x e same 1 write data to cache 2 pass single-beat write to memory queue write-with- flush store 1 stwcx. (t = 0) write no 10x m same 1 crtry write 2 push block to write queue write-with-kill store (t = 0) or stwcx. (wim = 10x) write no x1x i same pass single-beat write to memory queue write-with- flush store (t = 0) or stwcx. (wim = 10x) write no x1x e i crtry write store (t = 0) or stwcx. (wim = 10x) write no x1x m i 1 crtry write 2 push block to write queue write-with-kill stwcx. conditional write if the reserved bit is set, this operation is like other writes except the bus operation uses a special encoding. dcbf data cache block ?sh no xxx i,e same 1 crtry dcbf 2 pass ?sh flush same i 3 state change only dcbf data cache block ?sh no xxx m i push block to write queue write-with-kill dcbst data cache block store no xxx i,e same 1 crtry dcbst 2 pass clean clean same same 3 no action dcbst data cache block store no xxx m e push block to write queue write-with-kill dcbz data cache block set to zero no x1x x x alignment trap table 3-8. mei state transitions (continued) operation cache operation bus sync wim current state next state cache actions bus operation
3-30 mpc603e & EC603E risc microprocessors user's manual motorola dcbz data cache block set to zero no 10x x x alignment trap dcbz data cache block set to zero yes 00x i same 1 crtry dcbz 2 cast out of modi?d block write-with-kill 3 pass kill kill same m 4 clear block dcbz data cache block set to zero no 00x e,m m clear block dcbt data cache block touch no x1x i same pass single-beat read to memory queue read dcbt data cache block touch no x1x e i crtry read dcbt data cache block touch no x1x m i 1 crtry read 2 push block to write queue write-with-kill dcbt data cache block touch no x0x i same 1 cast out of modi?d block (as required) write-with-kill 2 pass four-beat read to memory queue read dcbt data cache block touch no x0x e,m same no action single-beat read reload dump 1 no xxx i same forward data_in four-beat read (double-word- aligned) reload dump no xxx i e write data_in to cache four-beat write (double-word- aligned) reload dump no xxx i m write data_in to cache e ? i snoop write or kill no xxx e i state change only (committed) m ? i snoop kill no xxx m i state change only (committed) push m ? i snoop ?sh no xxx m i conditionally push write-with-kill push m ? e snoop clean no xxx m e conditionally push write-with-kill table 3-8. mei state transitions (continued) operation cache operation bus sync wim current state next state cache actions bus operation
motorola chapter 3. instruction and data cache operation 3-31 note that single-beat writes are not snooped in the write queue. tlbie tlb invalidate no xxx x x 1 crtry tlbi 2 pass tlbi 3 no action sync synchroni- zation no xxx x x 1 crtry sync 2 pass sync 3 no action table 3-8. mei state transitions (continued) operation cache operation bus sync wim current state next state cache actions bus operation
3-32 mpc603e & EC603E risc microprocessors user's manual motorola
motorola chapter 4. exceptions 4-1 chapter 4 exceptions 40 40 the powerpc exception mechanism allows the processor to change to supervisor state as a result of external signals, errors, or unusual conditions arising in the execution of instructions, and differ from the arithmetic exceptions de?ed by the ieee for ?ating- point operations. when exceptions (referred to as interrupts in the architecture speci?ation) occur, information about the state of the processor is saved to certain registers and the processor begins execution at an address (exception vector) predetermined for each exception. processing of exceptions occurs in supervisor mode. although multiple exception conditions can map to a single exception vector, a more speci? condition may be determined by examining a register associated with the exception?or example, the dsisr or the fpscr. additionally, certain exception conditions can be explicitly enabled or disabled by software. the powerpc architecture requires that exceptions be handled in program order; therefore, although a particular implementation may recognize exception conditions out of order, they are handled strictly in order with respect to the instruction stream. when an instruction- caused exception is recognized, any unexecuted instructions that appear earlier in the instruction stream, including any that have not yet entered the execute state, are required to complete before the exception is taken. any exceptions caused by those instructions are handled ?st. likewise, exceptions that are asynchronous and precise are recognized when they occur, but are not handled until the instruction currently in the completion stage successfully completes execution or generates an exception, and the completed store queue is emptied. an instruction is said to have ?ompleted?when the results of that instructions execution have been committed to the registers de?ed by the architecture (for example, the gprs or fprs, rather than rename buffers). if a single instruction encounters multiple exception conditions, those exceptions are taken and handled sequentially. likewise, exceptions that are asynchronous are recognized when they occur, but are not handled until the next instruction to complete in program order successfully completes. throughout this chapter, the term ?ext instruction implies the next instruction to complete in program order. note that exceptions can occur while an exception handler routine is executing, and multiple exceptions can become nested. it is up to the exception handler to save the states to allow control to ultimately return to the original excepting program.
4-2 mpc603e & EC603E risc microprocessors user's manual motorola unless a catastrophic condition causes a system reset or machine check exception, only one exception is handled at a time. if, for example, a single instruction encounters multiple exception conditions, those conditions are handled sequentially. after the exception handler handles an exception, the instruction execution continues until the next exception condition is encountered. however, in many cases there is no attempt to re-execute the instruction. this method of recognizing and handling exception conditions sequentially guarantees that exceptions are recoverable. exception handlers should save the information stored in srr0 and srr1 early to prevent the program state from being lost due to a system reset or machine check exception or to an instruction-caused exception in the exception handler, and before enabling external interrupts. in this chapter, the following terminology is used to describe the various stages of exception processing: recognition exception recognition occurs when the condition that can cause an exception is identi?d by the processor. taken an exception is said to be taken when control of instruction execution is passed to the exception handler; that is, the context is saved and the instruction at the appropriate vector offset is fetched and the exception handler routing is executed in supervisor mode. handling exception handling is performed by the software linked to the appropriate vector offset. exception handling is performed at supervisor-level. 4.1 exception classes the powerpc architecture supports four types of exceptions: synchronous, precise?hese are caused by instructions. all instruction-caused exceptions are handled precisely; that is, the machine state at the time the exception occurs is known and can be completely restored. this means that (excluding the trap and system call exceptions) the address of the faulting instruction is provided to the exception handler and that neither the faulting instruction nor subsequent instructions in the code stream will complete execution before the exception is taken. once the exception is processed, execution resumes at the address of the faulting instruction (or at an alternate address provided by the exception handler). when an exception is taken due to a trap or system call instruction, execution resumes at an address provided by the handler.
motorola chapter 4. exceptions 4-3 synchronous, imprecise?he powerpc architecture de?es two imprecise ?ating-point exception modes, recoverable and nonrecoverable. even though the powerpc 603e provides a means to enable the imprecise modes, it implements these modes identically to the precise mode (that is, all enabled ?ating-point enabled exceptions are always precise on the 603e). (the EC603E microprocessor does not support ?ating-point operations.) asynchronous, maskable?he external, system management interrupt (smi), and decrementer exceptions are maskable asynchronous exceptions. when these exceptions occur, their handling is postponed until the next instruction, and any exceptions associated with that instruction, completes execution. if there are no instructions in the execution units, the exception is taken immediately upon determination of the correct restart address (for loading srr0). asynchronous, nonmaskable?here are two nonmaskable asynchronous exceptions: system reset and the machine check exception. these exceptions may not be recoverable, or may provide a limited degree of recoverability. all exceptions report recoverability through the msr[ri] bit. the 603e exception classes are shown in table 4-1. although exceptions have other characteristics as well, such as whether they are maskable or nonmaskable, the distinctions shown in table 4-1 de?e categories of exceptions that the 603e handles uniquely. note that table 4-1 includes no synchronous imprecise exceptions. while the powerpc architecture supports imprecise handling of ?ating-point exceptions, the 603e, with the exception of the EC603E microprocessor, implements ?ating-point exception modes as precise exceptions. (the EC603E microprocessor does not support ?ating-point operations.) although the powerpc architecture speci?s that the recognition of the machine check exception is nonmaskable, on the 603e the stimuli that cause this exception are maskable. for example, the machine check exception is caused by the assertion of tea , ape , dpe , or mcp . however, the mcp , ape , and dpe signals can be disabled by bits 0, 2, and 3 respectively in hid0. therefore, the machine check caused by tea is the only truly nonmaskable machine check exception. table 4-1. exception classifications synchronous/asynchronous precise/imprecise exception type asynchronous, nonmaskable imprecise machine check system reset asynchronous, maskable precise external interrupt decrementer system management interrupt synchronous precise instruction-caused exceptions
4-4 mpc603e & EC603E risc microprocessors user's manual motorola the 603es exceptions, and conditions that cause them, are listed in figure 4-1. exceptions that are speci? to either the pid6-603e or pid7v-603e, or that are handled differently on the EC603E microprocessor, are indicated. figure 4-1. exceptions and conditions exception type vector offset (hex) causing conditions reserved 00000 system reset 00100 a system reset is caused by the assertion of either sreset or hreset . machine check 00200 a machine check is caused by the assertion of the tea signal during a data bus transaction, assertion of mcp , or an address or data parity error. dsi 00300 the cause of a dsi exception can be determined by the bit settings in the dsisr, listed as follows: 1 set if the translation of an attempted access is not found in the primary hash table entry group (hteg), or in the rehashed secondary hteg, or in the range of a dbat register; otherwise cleared. 4 set if a memory access is not permitted by the page or dbat protection mechanism; otherwise cleared. 5 set by an eciwx or ecowx instruction if the access is to an address that is marked as write-through, or execution of a load/store instruction that accesses a direct-store segment. 6 set for a store operation and cleared for a load operation. 11 set if eciwx or ecowx is used and ear[e] is cleared. isi 00400 an isi exception is caused when an instruction fetch cannot be performed for any of the following reasons: the effective (logical) address cannot be translated. that is, there is a page fault for this portion of the translation, so an isi exception must be taken to load the pte (and possibly the page) into memory. the fetch access is to a direct-store segment (indicated by srr1[3] set). the fetch access violates memory protection (indicated by srr1[4] set). if the key bits (ks and kp) in the segment register and the pp bits in the pte are set to prohibit read access, instructions cannot be fetched from this location. external interrupt 00500 an external interrupt is caused when msr[ee] = 1 and the int signal is asserted. alignment 00600 an alignment exception is caused when the 603e cannot perform a memory access for any of the reasons described below: the operand of a ?ating-point load or store instruction is not word-aligned. the operand of lmw , stmw , lwarx , and stwcx. instructions are not aligned. the operand of a single-register load or store operation is not aligned, and the 603e is in little-endian mode (pid6-603e only). the execution of a ?ating-point load or store instruction to a direct-store segment. the operand of a load, store, load multiple, store multiple, load string, or store string instruction crosses a segment boundary into a direct-store segment, or crosses a protection boundary. execution of a misaligned eciwx or ecowx instruction (pid7v-603e only). the instruction is lmw , stmw , lswi , lswx , stswi , stswx and the 603e is in little- endian mode. the operand of dcbz is in memory that is write-through-required or caching- inhibited.
motorola chapter 4. exceptions 4-5 program 00700 a program exception is caused by one of the following exception conditions, which correspond to bit settings in srr1 and arise during execution of an instruction: floating-point enabled exception? ?ating-point enabled exception condition is generated when the following condition is met: (msr[fe0] | msr[fe1]) & fpscr[fex] is 1. (not supported by the EC603E microprocessor.) fpscr[fex] is set by the execution of a ?ating-point instruction that causes an enabled exception or by the execution of one of the ?ove to fpscr instructions that results in both an exception condition bit and its corresponding enable bit being set in the fpscr. (not supported by the EC603E microprocessor.) illegal instruction?n illegal instruction program exception is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode ?lds (including powerpc instructions not implemented in the 603e), or when execution of an optional instruction not provided in the 603e is attempted (these do not include those optional instructions that are treated as no-ops). privileged instruction? privileged instruction type program exception is generated when the execution of a privileged instruction is attempted and the msr register user privilege bit, msr[pr], is set. in the 603e, this exception is generated for mtspr or mfspr with an invalid spr ?ld if spr[0] = 1 and msr[pr] = 1. this may not be true for all powerpc processors. trap? trap type program exception is generated when any of the conditions speci?d in a trap instruction is met. floating- point unavailable 00800 a ?ating-point unavailable exception is caused by an attempt to execute a ?ating-point instruction (including ?ating-point load, store, and move instructions) when the ?ating-point available bit is disabled (msr[fp] = 0). note that the EC603E microprocessor takes a ?ating-point unavailable exception when execution of a ?ating-point instruction is attempted. decrementer 00900 the decrementer exception occurs when the most signi?ant bit of the decrementer (dec) register transitions from 0 to 1. must also be enabled with the msr[ee] bit. reserved 00a00 00bff system call 00c00 a system call exception occurs when a system call ( sc ) instruction is executed. trace 00d00 a trace exception is taken when msr[se] =1 or when the currently completing instruction is a branch and msr[be] =1. reserved 00e00 the 603e does not generate an exception to this vector. other powerpc processors may use this vector for ?ating-point assist exceptions. reserved 00e10?0fff instruction translation miss 01000 an instruction translation miss exception is caused when an effective address for an instruction fetch cannot be translated by the itlb. figure 4-1. exceptions and conditions (continued) exception type vector offset (hex) causing conditions
4-6 mpc603e & EC603E risc microprocessors user's manual motorola exceptions are roughly prioritized by exception class, as follows: 1. nonmaskable, asynchronous exceptions have priority over all other exceptions system reset and machine check exceptions (although the machine check exception condition can be disabled so the condition causes the processor to go directly into the checkstop state). these exceptions cannot be delayed, and do not wait for the completion of any precise exception handling. 2. synchronous, precise exceptions are caused by instructions and are taken in strict program order. 3. maskable asynchronous exceptions (external interrupt and decrementer exceptions) are delayed until higher priority exceptions are taken. system reset and machine check exceptions may occur at any time and are not delayed even if an exception is being handled. as a result, state information for the interrupted exception may be lost; therefore, these exceptions are typically nonrecoverable. all other exceptions have lower priority than system reset and machine check exceptions, and the exception may not be taken immediately when it is recognized. data load translation miss 01100 a data load translation miss exception is caused when an effective address for a data load operation cannot be translated by the dtlb. data store translation miss 01200 a data store translation miss exception is caused when an effective address for a data store operation cannot be translated by the dtlb, or where a dtlb hit occurs, and the change bit in the pte must be set due to a data store operation. instruction address breakpoint 01300 an instruction address breakpoint exception occurs when the address (bits 0?9) in the iabr matches the next instruction to complete in the completion unit, and the iabr enable bit (bit 30) is set. system management interrupt 01400 a system management interrupt is caused when msr[ee] = 1 and the smi input signal is asserted. reserved 01500?2fff figure 4-1. exceptions and conditions (continued) exception type vector offset (hex) causing conditions
motorola chapter 4. exceptions 4-7 4.1.1 exception priorities the exceptions are listed in table 4-2 in order of highest to lowest priority. table 4-2. exception priorities exception category priority exception cause asynchronous 0 system reset hreset or power-on reset 1 machine check tea , mcp , ape , or dpe 2 system reset sreset 3 system management interrupt smi 4 external interrupt int 5 decrementer exception decrementer passed through 0x00000000 instruction fetch 0 itlb miss instruction tlb miss 1 instruction access instruction access exception
4-8 mpc603e & EC603E risc microprocessors user's manual motorola exception priorities are described in detail in ?xception priorities,?in chapter 6, ?xceptions,?in the programming environments manual . instruction dispatch/ execution 0 iabr instruction address breakpoint exception 1 program program exception due to the following: illegal instruction privileged instruction ?rap 2 system call system call exception 3 floating-point unavailable floating-point unavailable exception due to the following: 603e microprocessor?loating-point unavailable exception. EC603E microprocessor?xecution of a ?ating-point instruction. 4 program program exception due to a ?ating-point enabled exception 5 alignment alignment exception due to the following: floating-point not word-aligned (not applicable to the EC603E microprocessor) lmw , stmw , lwarx , or stwcx. not word-aligned little-endian access is misaligned multiple or string access with little-endian bit set 6 data access data access exception due to a bat page protection violation 7 data access data access exception due to the following: eciwx , ecowx , lwarx , or stwcx. to direct-store segment (bit 5 of dsisr) crossing from memory segment to direct-store segment (bit 0 of dsisr) crossing from direct-store segment to memory segment any access to direct-store, sr[t] = 1 eciwx or ecowx with ear[e] = 0 (bit 11 of dsisr) 8 dtlb miss data tlb miss exception due to: store miss load miss 9 alignment alignment exception due to a dcbz to a write-through or caching-inhibited page 10 data access data access exception due to tlb page protection violation 11 dtlb miss data tlb miss exception due to a change bit not set on a store operation post- instruction execution 0 trace trace exception due to the following: msr[se] = 1 msr[be] = 1 for branches table 4-2. exception priorities (continued) exception category priority exception cause
motorola chapter 4. exceptions 4-9 4.1.2 summary of front-end exception handling the following list of interrupt categories describes how the 603e handles exceptions up to the point of signaling the appropriate exception to occur. note that a recoverable state is reached if the completed store queue is empty (drained, not canceled) and any instruction that is next in program order and has been signaled to complete has completed. if msr[ri] is clear, the 603e is in a nonrecoverable state by default. also, completion of an instruction is de?ed as performing all architectural register writes associated with that instruction, and then removing that instruction from the completion buffer queue. asynchronous nonmaskable nonreco v erable ?system reset caused by the assertion of either hreset or internally during power-on reset (por)). these exceptions have highest priority and are taken immediately regardless of other pending exceptions or recoverability. a nonpredicted address is guaranteed. asynchronous maskable nonreco v erable ?machine check). a machine check exception takes priority over any other pending exception except a nonrecoverable system reset caused by the assertion of either hreset or internally during por. a machine check exception is taken immediately regardless of recoverability. a machine check exception can occur only if the machine check enable bit, msr[me], is set. if msr[me] is cleared, the processor goes directly into checkstop state when a machine check exception condition occurs. a nonpredicted address is guaranteed. asynchronous nonmaskable reco v erable ?system reset caused by the assertion of sreset ). this interrupt takes priority over any other pending exceptions except nonrecoverable exceptions listed above. this exception is taken immediately when a recoverable state is reached. asynchronous maskable reco v erable ?system management interrupt, external interrupt, decrementer exception). before handling this type of exception, the next instruction in program order must complete or except. if this action causes another type of exception, that exception is taken and the asynchronous maskable recoverable exception remains pending. once an instruction can complete without causing an exception, further instruction completion is halted while the exception not taken remains pending. the exception is taken when a recoverable state is reached. instruction fetch ?itlb, isi). when this type of exception is detected, dispatch is halted and the current instruction stream is allowed to drain. if completing any instructions in this stream causes an exception, that exception is taken and the instruction fetch exception is forgotten. otherwise, as soon as the machine is empty and a recoverable state is reached, the instruction fetch exception is taken.
4-10 mpc603e & EC603E risc microprocessors user's manual motorola instruction dispatch/e x ecution ?program, dsi, alignment, emulation trap, system call, dtlb miss on load or store, iabr). this type of exception is determined at dispatch or execution of an instruction. the exception remains pending until all instructions in program order before the exception-causing instruction are completed. the exception is then taken without completing the exception-causing instruction. if any other exception condition is created in completing these previous instructions in the machine, that exception takes priority over the pending instruction dispatch/execution exception, which will then be forgotten. post?nstruction e x ecution ?trace). this type of exception is generated following execution and completion of an instruction while a trace mode is enabled. if executing the instruction produces conditions for another type of interrupt, that exception is taken and the post-instruction execution exception is forgotten for that instruction. 4.2 exception processing when an exception is taken, the processor uses the save/restore registers, srr0 and srr1, to save the contents of the machine state register for user-level mode (referred to as problem mode in the architecture speci?ation) and to identify where instruction execution should resume after the exception is handled. when an exception occurs, srr0 is set to point to the instruction at which instruction processing should resume when the exception handler returns control to the interrupted process. all instructions in the program ?w preceding this one will have completed and no subsequent instruction will have completed. this may be the address of the instruction that caused the exception or the next one (as in the case of a system call exception). the instruction addressed can be determined from the exception type and status bits. this address is used to resume instruction processing in the interrupted process, typically when an r instruction is executed. the srr0 register is shown in figure 4-2 . figure 4-2. machine status save/restore register 0 the save/restore register 1 (srr1) is used to save machine status (the contents of the msr) on exceptions and to restore those values when r is executed. srr1 is shown in figure 4-3. figure 4-3. machine status save/restore register 1 srr0 (holds ea for resuming program execution) 0 31 0 31 exception-specific information and msr bit values
motorola chapter 4. exceptions 4-11 typically, when an exception occurs, bits 0?5 of srr1 are loaded with exception-speci? information and bits 16?1 of msr are placed into the corresponding bit positions of srr1. the 603e loads srr1 with speci? bits for handling machine check exceptions, as shown in table 4-3. the 603e loads srr1 with speci? bits for handling the three tlb miss exceptions, as shown in table 4-4. table 4-3. srr1 bit settings for machine check exceptions bits name description 0 msr[0] copy of msr bit 0 1? reserved 5? msr[5?] copy of msr bits 5? 10?1 reserved 12 mcp machine check 13 tea tea error 14 dpe data parity error 15 ape address parity error 16?1 msr[16?1] copy of msr bits16?1 table 4-4 . srr1 bit settings for software table search operations bits name description 0? crf0 copy of condition register ?ld 0 (cr0) 4 reserved 5? msr[5?] copy of msr bits 5? 10?1 reserved 12 key tlb miss protection key 13 i/d instruction/data tlb miss 0 dtlb miss 1 itlb miss 14 way bit 14 indicates which tlb associativity set should be replaced 0 set 0 1 set 1 15 s/l store/load protection instruction 0 load miss 1 store miss 16?1 msr[16?1] copy of msr bits 16?1
4-12 mpc603e & EC603E risc microprocessors user's manual motorola note that in some implementations, every instruction fetch when msr[ir] = 1 and every instruction execution requiring address translation when msr[dr] = 1 may modify srr1. the msr is shown in figure 4-4. when an exception occurs, msr bits, as described in table 4-5, are altered as determined by the exception. figure 4-4. machine state register (msr) table 4-5 shows the bit de?itions for the msr. full function reserved bits are saved in srr1 when an exception occurs; partial function reserved bits are not saved. table 4-5. msr bit settings bit(s) name description 0 reserved. full function. 1? reserved. partial function. 5? reserved. full function. 10?2 reserved. partial function. 13 pow power management enable (603e-speci?) 0 disables programmable power modes (normal operation mode). 1 enables programmable power modes (nap, doze, or sleep mode). this bit controls the programmable power modes only; it has no effect on dynamic power management (dpm). msr[pow] may be altered with an mtmsr instruction only. also, when altering the pow bit, software may alter only this bit in the msr and no others. the mtmsr instruction must be followed by a context-synchronizing instruction. see chapter 9, ?ower management, for more information. 14 tgpr temporary gpr remapping (603e-speci?) 0 normal operation 1 tgpr mode. gpr0?pr3 are remapped to tgpr0?gpr3 for use by tlb miss routines. the contents of gpr0?pr3 remain unchanged while msr[tgpr] = 1. attempts to use gpr4?pr31 with msr[tgpr] = 1 yield unde?ed results. temporarily replacestgpr0 tgpr3 with gpr0?pr3 for use by tlb miss routines. when this bit is set, all instruction accesses to gpr0?pr3 are mapped to tgpr0?gpr3, respectively. the tgpr bit is set when either an instruction tlb miss, data read miss, or data write miss exception is taken. the tgpr bit is cleared by an r instruction. 15 ile exception little-endian mode. when an exception occurs, this bit is copied into msr[le] to select the endian mode for the context established by the exception. 16 ee external interrupt enable 0 the processor ignores external interrupts, system management interrupts, and decrementer interrupts. 1 the processor is enabled to take an external interrupt, system management interrupt, or decrementer interrupt. 0 1213141516171819 202122 2324252627 28293031 reserved 0 0 0 0 0 0 0 0 0 0 0 0 0 pow tgpr ile ee pr fp me fe0 se be fe1 0 ip ir dr 0 0 ri le
motorola chapter 4. exceptions 4-13 17 pr privilege level 0 the processor can execute both user- and supervisor-level instructions. 1 the processor can only execute user-level instructions. 18 fp floating-point available 0 the processor prevents dispatch of ?ating-point instructions, including ?ating-point loads, stores, and moves, default state for the EC603E microprocessor. 1 the processor can execute ?ating-point instructions, and can take ?ating-point enabled exception type program exceptions. 19 me machine check enable 0 machine check exceptions are disabled. 1 machine check exceptions are enabled. 20 fe0 floating-point exception mode 0 (see table 4-6) (not supported on the EC603E microprocessor) 21 se single-step trace enable 0 the processor executes instructions normally. 1 the processor generates a trace exception upon the successful completion of the next instruction. 22 be branch trace enable 0 the processor executes branch instructions normally. 1 the processor generates a trace exception upon the successful completion of a branch instruction. 23 fe1 floating-point exception mode 1 (see table 4-6) (not supported on the EC603E microprocessor) 24 reserved. full function. 25 ip exception pre?. the setting of this bit speci?s whether an exception vector offset is prepended with fs or 0s. in the following description, nnnnn is the offset of the exception. see figure 4-1. 0 exceptions are vectored to the physical address 0x000 n_nnnn . 1 exceptions are vectored to the physical address 0xfff n_nnnn . 26 ir instruction address translation 0 instruction address translation is disabled. 1 instruction address translation is enabled. for more information see chapter 5, ?emory management. 27 dr data address translation 0 data address translation is disabled. 1 data address translation is enabled. for more information see chapter 5, ?emory management. 28?9 reserved. full function. 30 ri recoverable exception (for system reset and machine check exceptions) 0 exception is not recoverable. 1 exception is recoverable. 31 le little-endian mode enable 0 the processor runs in big-endian mode. 1 the processor runs in little-endian mode. table 4-5. msr bit settings (continued) bit(s) name description
4-14 mpc603e & EC603E risc microprocessors user's manual motorola the ieee ?ating-point exception mode bits (fe0 and fe1) together de?e whether ?ating-point exceptions are handled precisely, imprecisely, or whether they are taken at all. (note that fe0 and fe1 are not supported on the EC603E microprocessor.) the possible settings and default conditions for the 603e are shown in table 4-6. for further details, see chapter 6, ?xceptions,?of the programming environments manual . msr bits are guaranteed to be written to srr1 when the ?st instruction of the exception handler is encountered. 4.2.1 enabling and disabling exceptions when a condition exists that may cause an exception to be generated, it must be determined whether the exception is enabled for that condition. ieee ?ating-point enabled exceptions (a type of program exception) are ignored when both msr[fe0] and msr[fe1] are cleared. if either of these bits are set, all ieee enabled ?ating-point exceptions are taken and cause a program exception. (not supported on the EC603E microprocessor.) asynchronous, maskable exceptions (that is, the external, system management, and decrementer interrupts) are enabled by setting the msr[ee] bit. when msr[ee] = 0, recognition of these exception conditions is delayed. msr[ee] is cleared automatically when an exception is taken, to delay recognition of conditions causing those exceptions. a machine check exception can occur only if the machine check enable bit, msr[me], is set. if msr[me] is cleared, the processor goes directly into checkstop state when a machine check exception condition occurs. individual machine check exceptions can be enabled and disabled through bits in the hid0 register, which is described in table 2-2. system reset exceptions cannot be masked. table 4-6. ieee floating-point exception mode bits fe0 fe1 mode 0 0 floating-point exceptions disabled 0 1 floating-point imprecise nonrecoverable* 1 0 floating-point imprecise recoverable* 1 1 floating-point precise mode * not implemented in the 603e
motorola chapter 4. exceptions 4-15 4.2.2 steps for exception processing after it is determined that the exception can be taken (by con?ming that any instruction- caused exceptions occurring earlier in the instruction stream have been handled, and by con?ming that the exception is enabled for the exception condition), the processor does the following: 1. the machine status save/restore register 0 (srr0) is loaded with an instruction address that depends on the type of exception. see the individual exception description for details about how this register is used for speci? exceptions. 2. bits 1? and 10?5 of srr1 are loaded with information speci? to the exception type. 3. bits 5? and 16?1 of srr1 are loaded with a copy of the corresponding bits of the msr. 4. the msr is set as described in table 4-5. the new values take effect beginning with the fetching of the ?st instruction of the exception-handler routine located at the exception vector address. note that msr[ir] and msr[dr] are cleared for all exception types; therefore, address translation is disabled for both instruction fetches and data accesses beginning with the ?st instruction of the exception-handler routine. 5. instruction fetch and execution resumes, using the new msr value, at a location speci? to the exception type. the location is determined by adding the exception's vector (see figure 4-1) to the base address determined by msr[ip]. if ip is cleared, exceptions are vectored to the physical address 0x000 n_nnnn . if ip is set, exceptions are vectored to the physical address 0xfff n_nnnn . for a machine check exception that occurs when msr[me] = 0 (machine check exceptions are disabled), the processor enters the checkstop state (the machine stops executing instructions). see section 4.5.2, ?achine check exception (0x00200).? 4.2.3 setting msr[ri] the operating system should handle msr[ri] as follows: in the machine check and system reset exceptions?f srr1[ri] is cleared, the exception is not recoverable. if it is set, the exception is recoverable with respect to the processor. in each exception handler?hen enough state information has been saved that a machine check or system reset exception can reconstruct the previous state, set msr[ri]. in each exception handler?lear msr[ri], set the srr0 and srr1 registers appropriately, and then execute r . note that the ri bit being set indicates that, with respect to the processor, enough processor state data is valid for the processor to continue, but it does not guarantee that the interrupted process can resume.
4-16 mpc603e & EC603E risc microprocessors user's manual motorola 4.2.4 returning from an exception handler the return from interrupt ( r ) instruction performs context synchronization by allowing previously issued instructions to complete before returning to the interrupted process. in general, execution of the r? instruction ensures the following: all previous instructions have completed to a point where they can no longer cause an exception. if a previous instruction causes a direct-store interface error exception, the results must be determined before this instruction is executed. previous instructions complete execution in the context (privilege, protection, and address translation) under which they were issued. the r instruction copies srr1 bits back into the msr. the instructions following this instruction execute in the context established by this instruction. for a complete description of context synchronization, refer to chapter 6, ?xceptions,?of the programming environments manual. 4.3 process switching the operating system should execute one of the following when processes are switched: the sync instruction, which orders the effects of instruction execution. all instructions previously initiated appear to have completed before the sync instruction completes, and no subsequent instructions appear to be initiated until the sync instruction completes. for an example showing use of the sync instruction, see chapter 2, ?owerpc register set,?of the programming environments manual. the isync instruction, which waits for all previous instructions to complete and then discards any fetched instructions, causing subsequent instructions to be fetched (or refetched) from memory and to execute in the context (privilege, translation, protection, etc.) established by the previous instructions. the stwcx. instruction, to clear any outstanding reservations, which ensures that an lwarx instruction in the old process is not paired with an stwcx. instruction in the new process. the operating system should set the msr[ri] bit as described in section 4.2.3, ?etting msr[ri].
motorola chapter 4. exceptions 4-17 4.4 exception latencies latencies for taking various exceptions depend on the state of the machine when the exception conditions occur. this latency may be as short as one cycle, in which case an exception is signaled in the cycle following the appearance of the exception condition. the latencies are as follows: hard reset and machine check?n most cases, a hard reset or machine check exception will have a single-cycle latency. a two-to-three-cycle delay may occur only when a predicted instruction is next to complete, and the branch guess that forced this instruction to be predicted was resolved to be incorrect. soft reset?he latency of a soft reset exception is affected by recoverability. the time to reach a recoverable state may depend on the time needed to complete or except an instruction at the point of completion, the time needed to drain the completed store queue, and the time waiting for a correct empty state so that a valid msr[ip] may be saved. for lower-priority externally-generated interrupts, a delay may be incurred waiting for another interrupt, generated while reaching a recoverable state, to be serviced. further delays are possible for other types of exceptions depending on the number and type of instructions that must be completed before those exceptions may be serviced. see section 4.1.2, ?ummary of front-end exception handling,?to determine possible maximum latencies for different exceptions. 4.5 exception de?itions table 4-7 shows all the types of exceptions that can occur with the 603e and the msr bit settings when the processor transitions to supervisor mode. the state of these bits prior to the exception is typically stored in srr1. table 4-7. msr setting due to exception exception type msr bit pow tgpr ile ee pr fp 1 me fe0 2 se be fe1 2 ip ir dr ri le system reset 0 0 0 0 0 0 0 0 0 0 0 0 ile machine check 0 0 0 0 0 0 0 0 0 0 0 0 0 ile dsi 0 0 0 0 0 0 0 0 0 0 0 0 ile isi 0 0 0 0 0 0 0 0 0 0 0 0 ile external 0 0 0 0 0 0 0 0 0 0 0 0 ile alignment 0 0 0 0 0 0 0 0 0 0 0 0 ile program 0 0 0 0 0 0 0 0 0 0 0 0 ile floating- point unavailable 3 0 0 0 0 0 0 0 0 0 0 0 0 ile
4-18 mpc603e & EC603E risc microprocessors user's manual motorola 4.5.1 reset exceptions (0x00100) the system reset exception is a nonmaskable, asynchronous exception signaled to the 603e either through the assertion of the reset signals (sreset or hreset ) or internally during the power-on reset (por) process. the assertion of the soft reset signal, sreset , as described in section 7.2.9.6.2, ?oft reset (sreset)?nput?causes the soft reset exception to be taken and the physical base address of the handler is determined by the msr[ip] bit. the assertion of the hard reset signal, hreset , as described in section 7.2.9.6.1, ?ard reset (hreset)?nput?causes the hard reset exception to be taken and the physical address of the handler is always 0xfff0_0100. decrementer 0 0 0 0 0 0 0 0 0 0 0 0 ile system call 0 0 0 0 0 0 0 0 0 0 0 0 ile trace exception 0 0 0 0 0 0 0 0 0 0 0 0 ile itlb miss 0 1 0 0 0 0 0 0 0 0 0 0 ile dtlb miss on load 0 1 0 0 0 0 0 0 0 0 0 0 ile dtlb miss on store 0 1 0 0 0 0 0 0 0 0 0 0 ile instruction address breakpoint 0 0 0 0 0 0 0 0 0 0 0 0 ile system management interrupt 0 0 0 0 0 0 0 0 0 0 0 0 ile 0 bit is cleared 1 bit is set ile bit is copied from the ile bit in the msr. ? bit is not altered reserved bits are read as if written as 0. notes: 1. the ?ating-point available bit is always set to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor. 3. on the EC603E microprocessor, the ?ating-point unavailable exception is caused by the execution of a ?ating-point instruction. table 4-7. msr setting due to exception (continued) exception type msr bit pow tgpr ile ee pr fp 1 me fe0 2 se be fe1 2 ip ir dr ri le
motorola chapter 4. exceptions 4-19 4.5.1.1 hard reset and power-on reset as described in 4.1.2, ?ummary of front-end exception handling,?the hard reset exception is a nonrecoverable, nonmaskable asynchronous exception (maskable interrupt). when hreset is asserted or at power-on reset (por), the 603e immediately branches to 0xfff0_0100 without attempting to reach a recoverable state. a hard reset has the highest priority of any exception. it is always nonrecoverable. table 4-8 shows the state of the machine just before it fetches the ?st instruction of the system reset handler after a hard reset. the hreset signal can be asserted for the following reasons: system power-on reset system reset from a panel switch an action required by the esp utility for information on the hreset signal, see section 7.2.9.6.1, ?ard reset (hreset) input. table 4-8. settings caused by hard reset register setting register setting gprs unknown pvr 0003000 n fprs* unknown hid0 00000000 fpscr* 00000000 hid1 00000000 cr all 0s dmiss and imiss all 0s srs unknown dcmp and icmp all 0s msr 00000040 rpa all 0s xer 00000000 iabr all 0s tbu 00000000 dsisr 00000000 tbl 00000000 dar 00000000 lr 00000000 dec ffffffff ctr 00000000 hash1 00000000 sdr1 00000000 hash2 00000000 srr0 00000000 tlbs unknown srr1 00000000 cache all cache blocks invalidated sprgs 00000000 bats unknown tag directory all 0s. (however, lru bits are initialized so each side of the cache has a unique lru value.) note: not supported on the EC603E microprocessor.
4-20 mpc603e & EC603E risc microprocessors user's manual motorola the following is also true after a hard reset operation: external checkstops are enabled. the on-chip test interface has given control of the i/os to the rest of the chip for functional use. since the reset exception has data and instruction translation disabled (msr[dr] and msr[ir] both cleared), the chip operates in real addressing mode as described in section 5.2, ?eal addressing mode. 4.5.1.2 soft reset as described in section 4.1.2, ?ummary of front-end exception handling,?the soft reset exception is a type of system reset exception that is recoverable, nonmaskable, and asynchronous. when sreset is asserted, the processor attempts to reach a recoverable state by allowing the next instruction to either complete or cause an exception, blocking the completion of subsequent instructions, and allowing the completed store queue to drain. unlike a hard reset, the latches are not initialized and the instruction cache is disabled. the sreset signal must be asserted for at least two bus clock cycles. after the sreset signal is negated, the 603e vectors to the system reset routine at 0x0000_0100 if msr[ip] is cleared or 0xfff0_0100 if msr[ip] is set. a soft reset is recoverable provided that attaining the recoverable state does not cause a machine check exception. this interrupt case is third in priority, following hard reset and machine check. when a soft reset occurs, registers are set as shown in table 4-9. table 4-9. soft reset exception?egister settings register setting description srr0 set to the effective address of the instruction that the processor would have attempted to complete next if no exception conditions were present. srr1 0?5 cleared 16?1 loaded from bits 16?1 of the msr. note that if the processor state is corrupted to the extent that execution cannot be reliably restarted, srr1[30] is cleared. msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile notes: 1. the ?ating-point available bit is always set to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
motorola chapter 4. exceptions 4-21 4.5.2 machine check exception (0x00200) the 603e conditionally initiates a machine check exception after detecting the assertion of the tea or mcp signals on the 603e bus (assuming the machine check is enabled, msr[me] = 1). the assertion of one of these signals indicates that a bus error occurred and the system terminates the current transaction. one clock cycle after the signal is asserted, the data bus signals go to the high-impedance state; however, data entering the gpr or the cache is not invalidated. note that if hid0[emcp] is cleared, the processor ignores the assertion of the mcp signal. note that the 603e makes no attempt to force recoverability; however, it does guarantee the machine check exception is always taken immediately upon request, with a nonpredicted address saved in srr0, regardless of the current machine state. any pending stores in the completed store queue are canceled when the exception is taken. software can use the machine check exception in a recoverable mode for checking bus con?uration. for this case, a sync , load, sync instruction sequence is used. a subsequent machine check exception at the load address indicates a bus con?uration problem and the processor is in a recoverable state. if the msr[me] bit is set, the exception is recognized and handled; otherwise, the 603e attempts to enter an internal checkstop. note that the resulting machine check exception has priority over any exceptions caused by the instruction that generated the bus operation. machine check exceptions are only enabled when msr[me] = 1; this is described in section 4.5.2.1, ?achine check exception enabled (msr[me] = 1).?if msr[me] = 0 and a machine check occurs, the processor enters the checkstop state. checkstop state is described in 4.5.2.2, ?heckstop state (msr[me] = 0).
4-22 mpc603e & EC603E risc microprocessors user's manual motorola 4.5.2.1 machine check exception enabled (msr[me] = 1) when a machine check exception is taken, registers are updated as shown in table 4-10. when a machine check exception is taken, instruction execution for the handler begins at offset 0x00200 from the physical base address indicated by msr[ip]. in order to return to the main program, the exception handler should do the following: 1. srr0 and srr1 should be given the values to be used by the r instruction. 2. execute r . 4.5.2.2 checkstop state (msr[me] = 0) when the 603e enters the checkstop state, it asserts the checkstop output signal, ckstp_out . the following events will cause the 603e to enter the checkstop state: machine check exception occurs with msr[me] cleared. external checkstop input, ckstp_in , is asserted. an extended transfer protocol error occurs. when a processor is in the checkstop state, instruction processing is suspended and generally cannot be restarted without resetting the processor. the contents of all latches are frozen within two cycles upon entering the checkstop state so that the state of the processor can be analyzed as an aid in problem determination. table 4-10. machine check exception?egister settings register setting description srr0 set to the address of the next instruction that would have been completed in the interrupted instruction stream. neither this instruction nor any others beyond it will have been completed. all preceding instructions will have been completed. srr1 0?1 cleared 12 mcp ?achine check signal caused exception 13 tea ?ransfer error acknowledge signal caused exception 14 dpe ?ata parity error signal caused exception 15 ape ?ddress parity error signal caused exception 16?1 loaded from msr[16?1]. msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile note that when a machine check exception is taken, the exception handler should set msr[me] as soon as it is practical to handle another tea assertion. otherwise, subsequent tea assertions cause the processor to automatically enter the checkstop state. notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
motorola chapter 4. exceptions 4-23 note that not all powerpc processors provide the same level of error checking. the reasons a processor can enter checkstop state are implementation-dependent. 4.5.3 dsi exception (0x00300) a dsi exception occurs when no higher priority exception exists and a data memory access cannot be performed. the condition that caused the dsi exception can be determined by reading the dsisr register, a supervisor-level spr (spr18) that can be read by using the mfspr instruction. bit settings are provided in table 4-11. table 4-11 also indicates which memory element is saved to the dar. dsi exceptions can occur for any of the following reasons: the instruction is not supported for the type of memory addressed. any access to a direct-store segment (sr[t] = 1). the access violates memory protection. access is not permitted by the key (ks and kp) and pp bits, which are set in the segment register and pte for page protection and in the bats for block protection. note that the oea speci?s an additional case that may cause a dsi exception?hen an effective address for a load, store, or cache operation cannot be translated by the tlbs. on the 603e, this condition causes a tlb miss exception instead. these scenarios are common among all powerpc processors. the following additional scenarios can cause a dsi exception in the 603e: a bus error indicates crossing from a direct-store segment to a memory segment. the execution of any load/store instruction to a direct-store segment, sr[t] = 1. a data access crosses from a memory segment (sr[t] = 0) into a direct-store segment (sr[t] = 1). dsi exceptions can be generated by load/store instructions, and the cache control instructions ( dcbi , dcbz , dcbst , and dcbf ). the 603e supports the crossing of page boundaries. however, if the second page has a translation error or protection violation associated with it, the 603e will take the dsi exception in the middle of the instruction. in this case, the data address register (dar) always points to a byte address in the ?st word of the offending page. if an stwcx. instruction has an effective address for which a normal store operation would cause a dsi exception, the 603e will take the dsi exception without checking for the reservation. if the xer indicates that the byte count for an lswi or stswi instruction is zero, a dsi exception does not occur, regardless of the effective address. the condition that caused the exception is de?ed in the dsisr. these conditions also use the data address register (dar) as shown in table 4-11.
4-24 mpc603e & EC603E risc microprocessors user's manual motorola when a dsi exception is taken, instruction execution for the handler begins at offset 0x00300 from the physical base address indicated by msr[ip]. the architecture permits certain instructions to be partially executed when they cause a dsi exception. these are as follows: load multiple or load string instructions?ome registers in the range of registers to be loaded may have been loaded. store multiple or store string instructions?ome bytes of memory in the range addressed may have been updated. in these cases, the number of registers and amount of memory altered are instruction- and boundary-dependent. however, memory protection is not violated. furthermore, if some of the data accessed is in direct-store space (sr[t] = 1) and the instruction is not supported for direct-store accesses, the locations in direct-store space are not accessed. for update forms, the update register ( r a) is not altered. table 4-11. dsi exception?egister settings register setting description srr0 set to the effective address of the instruction that caused the exception. srr1 0?5 cleared 16?1 loaded with bits 16?1 of the msr msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile dsisr 0 set if a load or store instruction results in a direct-store error exception. 1 set by the data tlb miss exception handler if the translation of an attempted access is not found in the primary hash table entry group (hteg), or in the rehashed secondary hteg, or in the range of a dbat register; otherwise cleared. 2? cleared 4 set if a memory access is not permitted by the page or bat protection mechanism; otherwise cleared. 5 set if the lwarx or stwcx. instruction is attempted to direct-store space. 6 set for a store operation and cleared for a load operation. 7?1 cleared dar set to the effective address of a memory element as described in the following list: a byte in the ?st word accessed in the page that caused the dsi exception, for a byte, half word, or word memory access. a byte in the ?st word accessed in the bat area that caused the dsi exception for a byte, half word, or word access to a bat area. a byte in the block that caused the exception for icbi , dcbz , dcbst , dcbf , or dcbi instructions. any ea in the memory range addressed (for direct-store exceptions). notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
motorola chapter 4. exceptions 4-25 4.5.4 isi exception (0x00400) the isi exception is implemented as it is de?ed by the powerpc architecture. an isi exception occurs when no higher priority exception exists and an attempt to fetch the next instruction fails for any of the following reasons: if an instruction tlb miss fails to ?d the desired pte, then a page fault is synthesized. the itlb miss handler branches to the isi exception handler to retrieve the translation from a storage device. an attempt is made to fetch an instruction from a direct-store segment while instruction translation is enabled (msr[ir] = 1). an attempt is made to fetch an instruction from no-execute memory. an attempt is made to fetch an instruction from guarded memory when msr[ir] = 1. the fetch access violates memory protection. register settings for this exception are described in chapter 6, ?xceptions,?in the programming environments manual. when an isi exception is taken, instruction execution for the handler begins at offset 0x00400 from the physical base address indicated by msr[ip]. 4.5.5 external interrupt (0x00500) an external interrupt is signaled to the 603e by the assertion of the int signal as described in section 7.2.9.1, ?nterrupt (int)?nput.?the interrupt may not be recognized if a higher priority exception occurs simultaneously or if the msr[ee] bit is cleared when int is asserted. after the int is detected (and provided that msr[ee] is set), the 603e generates a recoverable halt to instruction completion. the 603e requires the next instruction in program order to complete or except, block completion of any following instructions, and allow the completed store queue to drain. if any other exceptions are encountered in this process, they are taken ?st and the external interrupt is delayed until a recoverable halt is achieved. at this time the 603e saves the state information and takes the external interrupt as de?ed in the powerpc architecture.
4-26 mpc603e & EC603E risc microprocessors user's manual motorola the register settings for the external interrupt are shown in table 4-12. when an external interrupt is taken, instruction execution for the handler begins at offset 0x00500 from the physical base address indicated by msr[ip]. the 603e only recognizes the interrupt condition (int asserted) if the msr[ee] bit is set; it ignores the interrupt condition if the msr[ee] bit is cleared. to guarantee that the external interrupt is taken, the int signal must be held active until the 603e takes the interrupt. if the int signal is negated before the interrupt is taken, the 603e is not guaranteed to take an external interrupt. the interrupt handler must send a command to the device that asserted int , acknowledging the interrupt and instructing the device to negate int . 4.5.6 alignment exception (0x00600) this section describes conditions that can cause alignment exceptions in the 603e. similar to dsi exceptions, alignment exceptions use the srr0 and srr1 to save the machine state and the dsisr to determine the source of the exception. the 603e will initiate an alignment exception when it detects any of the following conditions: the operand of a ?ating-point load or store operation is not word-aligned. (not supported on the EC603E microprocessor.) the operand of an lmw , stmw , lwarx , or stwcx. instruction is not word-aligned. a little-endian access (msr[le] = 1) is misaligned. a multiple or string access is attempted with the msr[le] bit set. the operand of a dcbz instruction is in a page that is write-through or caching- inhibited. table 4-12. external interrupt?egister settings register setting srr0 set to the effective address of the instruction that the processor would have attempted to execute next if no interrupt conditions were present. srr1 0?5 cleared 16?1 loaded from bits 16?1 of the msr msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
motorola chapter 4. exceptions 4-27 the register settings for alignment exceptions are shown in table 4-12. the architecture does not support the use of an unaligned ea by lwarx or stwcx. instructions. if one of these instructions speci?s an unaligned ea, the exception handler should not emulate the instruction, but should treat the occurrence as a programming error. 4.5.6.1 integer alignment exceptions the 603e is optimized for load and store operations that are aligned on natural boundaries. operations that are not naturally aligned may suffer performance degradation, depending on the type of operation, the boundaries crossed, and the mode that the processor is in during execution. more speci?ally, these operations may either cause an alignment exception or they may cause the processor to break the memory access into multiple, smaller accesses with respect to the cache and the memory subsystem. table 4-13. alignment interrupt?egister settings register setting srr0 set to the effective address of the instruction that caused the exception. srr1 0?5 cleared 16?1 loaded from bits 16?1 of the msr msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile dsisr 0?1 cleared 12?3 cleared. (note that these bits can be set by several 64-bit powerpc instructions that are not supported in the 603e.) 14 cleared 15?6 for instructions that use register indirect with index addressing?et to bits 29?0 of the instruction. for instructions that use register indirect with immediate index addressing?leared. 17 for instructions that use register indirect with index addressing?et to bit 25 of the instruction. for instructions that use register indirect with immediate index addressing?set to bit 5 of the instruction 18?1 for instructions that use register indirect with index addressing?et to bits 21?4 of the instruction. for instructions that use register indirect with immediate index addressing?et to bits 1? of the instruction. 22?6 set to bits 6?0 (identifying either the source or destination) of the instruction. unde?ed for dcbz . 27?1 set to bits 11?5 of the instruction ( r a) set to either bits 11?5 of the instruction or to any register number not in the range of registers loaded by a valid form instruction, for lmw , lswi , and lswx instructions. otherwise unde?ed. dar set to the ea of the data access as computed by the instruction causing the alignment exception. notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
4-28 mpc603e & EC603E risc microprocessors user's manual motorola the 603e can initiate an alignment exception for the access shown in table 4-14. in this case, the appropriate range check is performed before the instruction begins execution. as a result, if an alignment exception is taken, it is guaranteed that no portion of the instruction has been executed. 4.5.6.1.1 page address translation access a page-address translation access occurs when msr[dr] is set, sr[t] is cleared and there is not a match in the bat. note the following points: the following is true for all loads and stores except strings/multiples: byte operands never cause an alignment exception. half-word operands can cause an alignment exception if the ea ends in 0xfff. word operands can cause an alignment exception if the ea ends in 0xffd?ff. double-word operands cause an alignment exception if the ea ends in 0xff9?ff. the dcbz instruction causes an alignment exception if the access is to a page or block with the w (write-through) or i (cache-inhibit) bit set in the tlb or bat, respectively. a misaligned memory access that does not cause an alignment exception will not perform as well as an aligned access of the same type. the resulting performance degradation due to misaligned accesses depends on how well each individual access behaves with respect to the memory hierarchy. at a minimum, additional cache access cycles are required that can delay other processor resources from using the cache. more dramatically, for an access to a noncacheable page, each discrete access involves individual processor bus operations that reduce the effective bandwidth of that bus. finally, note that when the 603e is in page address translation mode, there is no special handling for accesses that fall into bat regions. 4.5.6.2 floating-point alignment exceptions the 603e implements the alignment exception as it is de?ed in the powerpc architecture. for information on bit settings and how exception conditions are detected, refer to the programming environments manual . table 4-14. access types msr[dr] sr[t] access type 1 0 page-address translation access
motorola chapter 4. exceptions 4-29 note that the powerpc architecture allows individual processors to determine whether an exception is required to handle various alignment conditions. the 603e initiates an alignment exception when it detects any of the following conditions: the operand of a ?ating-point load or store operation is not word-aligned. the operand of a dcbz instruction is in a page that is write-through or caching- inhibited for a virtual mode access. the operand of an lmw , stmw , lwarx , or stwcx . instruction is not word-aligned. note that unlike other alignment exceptions, which store the address as computed by the instruction in the dar, alignment exceptions for load or store multiple instructions store that address value + 4 into the dar. a little-endian access is misaligned. a multiple access is attempted while the little-endian, msr[le], bit is set. 4.5.7 program exception (0x00700) the 603e implements the program exception as it is de?ed by the powerpc architecture (oea). a program exception occurs when no higher priority exception exists and one or more of the exception conditions de?ed in the oea occur. when a program exception is taken, instruction execution for the handler begins at offset 0x00700 from the physical base address indicated by msr[ip]. the exception conditions are as follows: floating-point enabled exception?hese exceptions correspond to ieee-de?ed exception conditions, such as over?ws, and divide by zeros that may occur during the execution of a ?ating-point arithmetic instruction. as a group, these exceptions are enabled by the fe0 and fe1 bits in the in the msr. individual conditions are enabled by speci? bits in the fpscr. for general information about this exception, see the programming environments manual . for more information about how these exceptions are implemented in the 603e, see section 4.5.7.1, ?eee floating-point exception program exceptions. note: the ?ating-point enabled exception is not supported on the EC603E microprocessor. illegal instruction?n illegal instruction program exception is generated when execution of an instruction is attempted with an illegal opcode or illegal combination of opcode and extended opcode ?lds (including powerpc instructions not implemented in the 603e). these do not include those optional instructions treated as no-ops. privileged instruction? privileged instruction type program exception is generated when the execution of a privileged instruction is attempted and the msr register user privilege bit, msr[pr], is set. in the 603e, this exception is generated for mtspr or mfspr with an invalid spr ?ld if spr[0] = 1 and msr[pr] = 1. this may not be true for all powerpc processors.
4-30 mpc603e & EC603E risc microprocessors user's manual motorola trap? trap type program exception is generated when any of the conditions speci?d in a trap instruction is met. 4.5.7.1 ieee floating-point exception program exceptions floating-point exceptions (not supported on the EC603E microprocessor) are signaled by condition bits set in the ?ating-point status and control register (fpscr). they can cause the system ?ating-point enabled exception handler to be invoked. the 603e handles all ?ating-point exceptions precisely. the 603e implements the fpscr as it is de?ed by the powerpc architecture; for more information about the fpscr, see the programming environments manual . floating-point operations that change exception sticky bits in the fpscr may suffer a performance penalty. when an exception is disabled in the fpscr and msr[fe] = 0, updates to the fpscr exception sticky bits are serialized at the completion stage. this serialization may result in a one- or two-cycle execution delay. the penalty is incurred only when the exception bit is changed and not on subsequent operations with the same exception. see chapter 6, ?nstruction timing,?for a full description of completion serialization. when an exception is enabled in the fpscr, the instruction traps to the emulation trap exception vector without updating the fpscr or the target fpr. the emulation trap exception handler is required to complete the instruction. the emulation trap exception handler is invoked regardless of the fe setting in the msr. the two ieee ?ating-point imprecise modes, de?ed by the powerpc architecture when msr[fe0] 1 msr[fe1], are treated as precise exceptions (that is, msr[fe0] = msr[fe1] = 1). this is regardless of the setting of msr[ni]. for the highest and most predictable ?ating-point performance, all exceptions should be disabled in the fpscr and msr. for more information about the program exception, see the programming environments manual . 4.5.7.2 illegal, reserved, and unimplemented instructions program exceptions in accordance with the powerpc architecture, the 603e considers all instructions de?ed for 64-bit implementations and unimplemented optional instructions, such as fsqrt , eciwx , and ecowx as illegal and takes a program exception when one of these instructions is encountered. likewise, if a supervisor-level instruction is encountered when the processor is in user-level mode, a privileged instruction-type program exception is taken. the 603e implements some instructions, such as double-precision ?ating-point and load/store string instructions in software. these instructions take the 603e-speci? emulation trap exception (0x01600) rather than a program exception.
motorola chapter 4. exceptions 4-31 4.5.8 floating-point unavailable exception (0x00800) a ?ating-point unavailable exception occurs when no higher priority exception exists, an attempt is made to execute a ?ating-point instruction (including ?ating-point load, store, and move instructions), and the ?ating-point available bit in the msr is disabled (msr[fp] = 0); note that on the EC603E microprocessor, the msr[fp] is always cleared to 0. register settings for this exception are described in chapter 6, ?xceptions,?in the programming environments manual when a ?ating-point unavailable exception is taken, instruction execution for the handler begins at offset 0x00800 from the physical base address indicated by msr[ip]. 4.5.9 decrementer exception (0x00900) the 603e implements the decrementer interrupt exception as it is de?ed in the powerpc architecture. a decrementer exception request is made when the decrementer counts down through zero. the request is held until there are no higher priority exceptions and msr[ee] = 1. at this point the decrementer exception is taken. if multiple decrementer exception requests are received before the ?st can be reported, only one exception is reported. the occurrence of a decrementer exception cancels the request. register settings for this exception are described in chapter 6, ?xceptions,?in the programming environments manual. when a decrementer exception is taken, instruction execution for the handler begins at offset 0x00900 from the physical base address indicated by msr[ip]. 4.5.10 system call exception (0x00c00) the 603e implements the system call exception as it is de?ed by the powerpc architecture. a system call exception request is made when a system call ( sc ) instruction is completed. if no higher priority exception exists, the system call exception is taken, with srr0 being set to the ea of the instruction following the sc instruction. register settings for this exception are described in chapter 6, ?xceptions,?in the programming environments manual. when a system call exception is taken, instruction execution for the handler begins at offset 0x00c00 from the physical base address indicated by msr[ip].
4-32 mpc603e & EC603E risc microprocessors user's manual motorola 4.5.11 trace exception (0x00d00) the trace exception is taken under one of the following conditions: when msr[se] is set, a single-step instruction trace exception is taken when no higher priority exception exists and any instruction (other than r or isync ) is successfully completed. note that other powerpc processors will take the trace exception on isync instructions (when msr[se] is set); the 603e does not take the trace exception on isync instructions. single-step instruction trace mode is described in section 4.5.11.1, ?ingle-step instruction trace mode. when msr[be] is set, the branch trace exception is taken after each branch instruction is completed. the 603e deviates from the architecture by not taking trace exceptions on isync instructions. single-step instruction trace mode is described in section 4.5.11.2, ?ranch trace mode. successful completion implies that the instruction caused no other exceptions. a trace exception is never taken for an sc instruction or for a trap instruction that takes a trap exception. msr[se] and msr[be] are cleared when the trace exception is taken. in the normal use of this function, msr[se] and msr[be] are restored when the exception handler returns to the interrupted program using an r? instruction. register settings for the trace mode are described in table 4-15. note that a trace or instruction address breakpoint exception condition generates a soft stop instead of an exception if soft stop has been enabled by the jtag/cop logic. if trace and breakpoint conditions occur simultaneously, the breakpoint conditions receive higher priority. when a trace exception is taken, instruction execution for the handler begins as offset 0x00d00 from the base address indicated by msr[ip]. table 4-15. trace exception?egister settings register setting description srr0 set to the address of the instruction following the one for which the trace exception was generated. srr1 0?5 cleared 16?1 loaded from bits 16?1 of the msr msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
motorola chapter 4. exceptions 4-33 4.5.11.1 single-step instruction trace mode the single-step instruction trace mode is enabled by setting msr[se]. encountering the single-step breakpoint causes one of the following actions: trap to address vector 0x00d00 soft stop (wait for quiescence) the default single-step trace action traps after an instruction execution and completion. the soft stop option, in which the 603e stops in a restartable state after an instruction execution and completion, can be enabled only through the cop function. the esp, which interfaces to the cop, can restart the 603e after a soft stop. refer to the section on jtag/cop and section 8.9, ?eee 1149.1-compliant interface,?for more information. 4.5.11.2 branch trace mode the branch trace mode is enabled by setting msr[be]. encountering the branch trace breakpoint causes one of the following actions: trap to interrupt vector 0x00d00 soft stop hard stop the default branch trace action is to trap after the completion of any branch instruction whenever msr[be] is set. however, if soft stop is enabled through the cop interface, the 603e stops in a restartable state. if hard stop is enabled through the cop interface, the 603e stops immediately without waiting to reach a restartable state. therefore, the 603e is not guaranteed to be restartable after a hard stop. for more information, see section 8.9, ?eee 1149.1-compliant interface. 4.5.12 instruction tlb miss exception (0x01000) when the effective address for an instruction load, store, or cache operation cannot be translated by the itlbs, an instruction tlb miss exception is generated. register settings for the instruction and data tlb miss exceptions are described in table 4-16.
4-34 mpc603e & EC603E risc microprocessors user's manual motorola if the instruction tlb miss exception handler fails to ?d the desired pte, then a page fault must be synthesized. the handler must restore the machine state and turn off the gprs before invoking the isi exception (0x00400). software table search operations are discussed in chapter 5, ?emory management. when an instruction tlb miss exception is taken, instruction execution for the handler begins at offset 0x01000 from the physical base address indicated by msr[ip]. 4.5.13 data tlb miss on load exception (0x01100) when the effective address for a data load or cache operation cannot be translated by the dtlbs, a data tlb miss on load exception is generated. register settings for the instruction and data tlb miss exceptions are described in table 4-16. if a data tlb miss exception handler fails to ?d the desired pte, then a page fault must be synthesized. the handler must restore the machine state and turn off msr[tgpr] before invoking the dsi exception (0x00300). software table search operations are discussed in chapter 5, ?emory management. when a data tlb miss on load exception is taken, instruction execution for the handler begins at offset 0x01100 from the physical base address indicated by msr[ip]. table 4-16. instruction and data tlb miss exceptions?egister settings register setting description srr0 set to the address of the next instruction to be executed in the program for which the tlb miss exception was generated. srr1 0? loaded from condition register cr0 ?ld 4?2 cleared 13 0 = data tlb miss 1 = instruction tlb miss 14 0 = replace tlb associativity set 0 1 = replace tlb associativity set 1 15 0 = data tlb miss on store (or c = 0) 1 = data tlb miss on load 16?1 loaded from bits 16?1 of the msr msr pow 0 tgpr 1 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
motorola chapter 4. exceptions 4-35 4.5.14 data tlb miss on store exception (0x01200) when the effective address for a data store or cache operation cannot be translated by the dtlbs, a data tlb miss on store exception is generated. the data tlb miss on store exception is also taken when the changed bit (c = 0) for a dtlb entry needs to be updated for a store operation. register settings for the instruction and data tlb miss exceptions are described in table 4-16. if a data tlb miss exception handler fails to ?d the desired pte, then a page fault must be synthesized. the handler must restore the machine state and turn off the tgprs before invoking a dsi exception (0x00300). software table search operations are discussed in chapter 5, ?emory management. when a data tlb miss on store exception is taken, instruction execution for the handler begins at offset 0x01200 from the physical base address indicated by msr[ip]. 4.5.15 instruction address breakpoint exception (0x01300) the instruction address breakpoint is controlled by the iabr special purpose register. iabr[0?9] holds an effective address to which each instruction is compared. the exception is enabled by setting iabr[30]. note that the 603e ignores the translation enable bit (iabr[31]). the exception is taken when an instruction breakpoint address matches on the next instruction to complete. the instruction tagged with the match is not completed before the instruction address breakpoint exception is taken. the breakpoint action can be one of the following: trap to interrupt vector 0x01300 (default) soft stop the bit settings for when an instruction address breakpoint exception is taken are shown in table 4-17. table 4-17. instruction address breakpoint exception?egister settings register setting description srr0 set to the address of the next instruction to be executed in the program for which the tlb miss exception was generated. srr1 0?5 cleared 16?1 loaded from bits 16?1 of the msr msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
4-36 mpc603e & EC603E risc microprocessors user's manual motorola the default breakpoint action is to trap before the execution of the matching instruction. the soft stop feature can be enabled only through the cop interface. with soft stop enabled, the 603e stops in a restartable state, while with hard stop enabled, the 603e stops immediately without attempting to reach a restartable state. upon restarting from a soft stop, the matching instructions are executed and completed unless it generates an exception. for soft stops, the next ten instructions that could have passed the iabr check can be monitored only by single-stepping the processor. when soft stops are used, the address compare must be separated by at least 10 instructions. if soft stop is enabled, only one soft stop is generated before completion of an instruction with an iabr match, regardless of whether a soft stop is generated before that instruction for any other reason, such as trace mode on for the preceding instruction or a cop soft stop request. table 4-18 shows the priority of actions taken when more than one mode is enabled for the same instruction. note that a trace or instruction address breakpoint exception condition generates a soft stop instead of an exception if soft stop has been enabled by the jtag/cop logic. if trace and breakpoint conditions occur simultaneously, the breakpoint conditions receive higher priority. the 603e requires that an mtspr instruction that updates the iabr be followed by a context-synchronizing instruction. if the mtspr instruction enables the instruction address breakpoint exception, the context-synchronizing instruction cannot generate a breakpoint response. the 603e also cannot block a breakpoint response on the context-synchronizing instruction if the breakpoint was disabled by the mtspr instruction. see ?ynchronization requirements for special registers and tlbs?in chapter 2, ?egister set,?in the programming environments manual ?for more information on this requirement. table 4-18. breakpoint action for multiple modes enabled for the same address iabr[ie] msr[be] msr[se] first action next action comments 1 1 0 instruction address trace (branch) enabling both modes is useful only if both trace and address breakpoint interrupts are needed. 1 0 1 instruction address breakpoint trace (single- step) enabling both modes is useful only if different breakpoint actions are required. 0 1 1 trace (branch) none the action for branch trace and single-step trace is the same. enabling both trace modes is redundant except for hard stop on branches. 1 1 1 instruction address breakpoint trace enabling all modes is redundant. this entry is for clari?ation only.
motorola chapter 4. exceptions 4-37 4.5.16 system management interrupt (0x01400) the system management interrupt behaves like an external interrupt except for the signal asserted and the vector taken. a system management interrupt is signaled to the 603e by the assertion of the smi signal. the interrupt may not be recognized if a higher priority exception occurs simultaneously or if the msr[ee] bit is cleared when smi is asserted. note that smi takes priority over int if they are recognized simultaneously. after the smi is detected (and provided that msr[ee] is set), the 603e generates a recoverable halt to instruction completion. the 603e requires the next instruction in program order to complete or except, block completion of any following instructions, and allow the completed store queue to drain. if any higher priority exceptions are encountered in this process, they are taken ?st and the system management interrupt is delayed until a recoverable halt is achieved. at this time the 603e saves state information and takes the system management interrupt. the register settings for the external interrupt exception are shown in table 4-19. when a system management interrupt is taken, instruction execution for the handler begins at offset 0x01400 from the physical base address indicated by msr[ip]. the 603e recognizes the interrupt condition (smi asserted) only if the msr[ee] bit is set; and ignores the interrupt condition otherwise. to guarantee that the external interrupt is taken, the smi signal must be held active until the 603e takes the interrupt. if the smi signal is negated before the interrupt is taken, the 603e is not guaranteed to take a system management interrupt. the interrupt handler must send a command to the device that asserted smi , acknowledging the interrupt and instructing the device to negate smi . table 4-19. system management interrupt?egister settings register setting description srr0 set to the effective address of the instruction that the processor would have attempted to complete next if no interrupt conditions were present. srr1 0?5 cleared 16?1 loaded from bits 16?1 of the msr msr pow 0 tgpr 0 ile ip ee 0 pr 0 fp 1 0 me fe0 2 0 se 0 be 0 fe1 2 0 ir 0 dr 0 ri 0 le set to value of ile notes: 1. the ?ating-point available bit is always cleared to 0 on the EC603E microprocessor. 2. fe0 and fe1 are not supported on the EC603E microprocessor.
4-38 mpc603e & EC603E risc microprocessors user's manual motorola
motorola chapter 5. memory management 5-1 chapter 5 memory management 50 50 this chapter describes the powerpc 603e microprocessors implementation of the memory management unit (mmu) speci?ations provided by the powerpc operating environment architecture (oea) for powerpc processors. the 603e mmu implementation is very similar to that of the powerpc 603 microprocessor except that the 603e implements an extra key bit in the srr1 register that simpli?s the table search software. in addition, because the 603e does not support direct-store bus accesses, it causes a dsi exception when a direct-store segment is encountered. refer to appendix c, ?owerpc 603 processor system design and programming considerations,?for a complete description of the differences applicable to the powerpc 603 microprocessor. the primary function of the mmu in a powerpc processor is the translation of logical (effective) addresses to physical addresses (referred to as real addresses in the architecture speci?ation) for memory accesses, and i/o accesses (i/o accesses are assumed to be memory-mapped). in addition, the mmu provides access protection on a segment, block, or page basis. this chapter describes the speci? hardware used to implement the mmu model of the oea in the 603e. refer to chapter 7, ?emory management,?in the programming environments manual for a complete description of the conceptual model. two general types of accesses generated by powerpc processors require address translation?nstruction accesses, and data accesses to memory generated by load and store instructions. generally, the address translation mechanism is de?ed in terms of segment descriptors and page tables used by powerpc processors to locate the effective-to-physical address mapping for instruction and data accesses. the segment information translates the effective address to an interim virtual address, and the page table information translates the virtual address to a physical address. the segment descriptors, used to generate the interim virtual addresses, are stored as on- chip segment registers on 32-bit implementations (such as the 603e). in addition, two translation lookaside buffers (tlbs) are implemented on the 603e to keep recently-used page address translations on-chip. although the powerpc oea describes one mmu (conceptually), the 603e hardware maintains separate tlbs and table search resources for instruction and data accesses that can be accessed independently (and simultaneously). therefore, the 603e is described as having two mmus, one for instruction accesses (immu) and one for data accesses (dmmu).
5-2 mpc603e & EC603E risc microprocessors user's manual motorola the block address translation (bat) mechanism is a software-controlled array that stores the available block address translations on-chip. bat array entries are implemented as pairs of bat registers that are accessible as supervisor-level special-purpose registers (sprs). there are separate instruction and data bat mechanisms, and in the 603e, they reside in the instruction and data mmus respectively. the mmus, together with the exception processing mechanism, provide the necessary support for the operating system to implement a paged virtual memory environment and for enforcing protection of designated memory areas. exception processing is described in chapter 4, ?xceptions.?section 4.2, ?xception processing,?describes the msr, which controls some of the critical functionality of the mmus. 5.1 mmu features the 603e implements the memory management speci?ation of the powerpc oea for 32- bit implementations. thus, it provides 4 gbytes of effective address space accessible to supervisor and user programs with a 4-kbyte page size and 256-mbyte segment size. in addition, the mmus of 32-bit powerpc processors use an interim virtual address (52 bits) and hashed page tables in the generation of 32-bit physical addresses. powerpc processors also have a block address translation (bat) mechanism for mapping large blocks of memory. block sizes range from 128 kbyte to 256 mbyte and are software-programmable. the 603e completely implements all features required by the mmu speci?ations of the powerpc architecture (oea) for 32-bit implementations. table 5-1 summarizes all 603e mmu features including the architectural features of powerpc mmus (de?ed by the oea) for 32-bit processors and the implementation-speci? features provided by the 603e. table 5-1. mmu features summary feature category architecturally de?ed/ 603e-speci? feature address ranges architecturally de?ed 2 32 bytes of effective address 2 52 bytes of virtual address 2 32 bytes of physical address page size architecturally de?ed 4 kbytes segment size architecturally de?ed 256 mbytes block address translation architecturally de?ed range of 128 kbyte?56 mbytes sizes implemented with ibat and dbat registers in bat array memory protection architecturally de?ed segments selectable as no-execute pages selectable as user/supervisor and read-only blocks selectable as user/supervisor and read-only page history architecturally de?ed referenced and changed bits de?ed and maintained
motorola chapter 5. memory management 5-3 5.1.1 memory addressing a program references memory using the effective (logical) address computed by the processor when it executes a load, store, or cache instruction, and when it fetches the next instruction. the effective address is translated to a physical address according to the procedures described in chapter 7, ?emory management,?in the programming environments manual , augmented with information in this chapter. the memory subsystem uses the physical address for the access. for a complete discussion of effective address calculation, see section 2.3.2.3, ?ffective address calculation. 5.1.2 mmu organization figure 5-1 shows the conceptual organization of a powerpc mmu in a 32-bit implementation; note that it does not describe the speci? hardware used to implement the memory management function for a particular processor. processors may optionally implement on-chip tlbs and may optionally support the automatic search of the page tables for ptes. in addition, other hardware features (invisible to the system software) not depicted in the ?ure may be implemented. page address translation architecturally de?ed translations stored as ptes in hashed page tables in memory page table size determined by mask in sdr1 register tlbs architecturally de?ed instructions for maintaining optional tlbs ( tlbie instruction in 603e) 603e-speci? 64-entry, two-way set associative itlb 64-entry, two-way set associative dtlb segment descriptors architecturally de?ed stored as segment registers on-chip page table search support 603e-speci? three mmu exceptions de?ed: itlb miss exception, dtlb miss on load exception, and dtlb miss on store (or c = 0) exception; mmu-related bits set in srr1 for these exceptions imiss and dmiss registers (missed effective address) hash1 and hash2 registers (pteg addr) icmp and dcmp registers (for comparing ptes) rpa register (for loading tlbs) tlbli r b instruction for loading itlb entries tlbld r b instruction for loading dtlb entries shadow registers for gpr0?pr3 (can use r0 r3 in table search handler without corruption of r0 r3 in context that was previously executing) table 5-1. mmu features summary (continued) feature category architecturally de?ed/ 603e-speci? feature
5-4 mpc603e & EC603E risc microprocessors user's manual motorola figure 5-2 and figure 5-3 show the conceptual organization of the 603e instruction and data mmus, respectively. the instruction addresses shown in figure 5-2 are generated by the processor for sequential instruction fetches and addresses that correspond to a change of program ?w. data addresses shown in figure 5-3 are generated by load and store instructions and by cache instructions. as shown in the ?ures, after an address is generated, the higher-order bits of the effective address, ea0?a19 (or a smaller set of address bits, ea0?a n , in the cases of blocks), are translated into physical address bits pa0?a19. the lower-order address bits, a20?31 are untranslated and therefore identical for both effective and physical addresses. after translating the address, the mmus pass the resulting 32-bit physical address to the memory subsystem. in addition to the higher-order address bits, the mmus automatically keep an indicator of whether each access was generated as an instruction or data access and a supervisor/user indicator that re?cts the state of the pr bit of the msr when the effective address was generated. in addition, for data accesses, there is an indicator of whether the access is for a load or a store operation. this information is then used by the mmus to appropriately direct the address translation and to enforce the protection hierarchy programmed by the operating system. section 4.2, ?xception processing,?describes the msr, which controls some of the critical functionality of the mmus. the ?ures show the way in which the a20?26 address bits index into the on-chip instruction and data caches to select a cache set. the remaining physical address bits are then compared with the tag ?lds (comprised of bits pa0?a19) of the four selected cache blocks to determine if a cache hit has occurred. in the case of a cache miss, the instruction or data access is then forwarded to the bus interface unit which then initiates an external memory access.
motorola chapter 5. memory management 5-5 figure 5-1. mmu conceptual block diagram?2-bit implementations mmu (32-bit) a20?31 x ea0?a19 ea0?a19 ea0?a3 0 15 segment registers . . . pa0?a31 ea0-ea14 ea0?a14 ea4?a19 on-chip tlbs (optional) page table search logic (optional) data accesses instruction accesses a20?31 spr25 sdr1 + x dbat0u dbat0l dbat3u dbat3l pa15?a19 + x ? x ea15-ea19 upper 24 bits of virtual address ea15?a19 pa0?a14 pa0?a19 bat hit ibat0u ibat0l ibat3u ibat3l optional
5-6 mpc603e & EC603E risc microprocessors user's manual motorola figure 5-2. immu block diagram compare immu a20?26 pa0?a31 instruction unit spr978 hash1 bpu itlb 0 15 select ea0?a19 rpa hash2 spr979 spr982 ibat0u ibat0l ibat3u ibat3l ibat array spr980 imiss icmp spr981 x + 0 31 0 127 tag s pa0?a19 compare i cache hit/miss select ea0?a14 ea0?a19 a20?31 pa0?a19 sdr1 spr25 i cache ea4?a19 segment registers . . . ea0?a3
motorola chapter 5. memory management 5-7 figure 5-3. dmmu block diagram dmmu pa0?a31 load/store unit spr978 hash1 dtlb 0 15 segment registers . . . ea0?a3 select ea0?a19 rpa hash2 spr979 spr982 dbat0u dbat0l dbat3u dbat3l dbat array spr976 dmiss dcmp spr977 x + 0 31 ea0?a19 a20?26 0 127 d cache hit/miss select pa0?a19 d cache spr25 sdr1 ea0?a14 ea4?a19 a20?31 tag s pa0?a19 compare compare
5-8 mpc603e & EC603E risc microprocessors user's manual motorola 5.1.3 address translation mechanisms powerpc processors support the following four types of address translation: page address translation?ranslates the page frame address for a 4-kbyte page size block address translation?ranslates the block number for blocks that range in size from 128 kbyte to 256 mbyte direct-store interface address translation?sed to generate direct-store interface accesses on the external bus; not implemented in the 603e. real addressing mode translation?hen address translation is disabled, the physical address is identical to the effective address. figure 5-4 shows the three implemented address translation mechanisms provided by the 603e mmus. the segment descriptors shown in the ?ure control the page address translation mechanism. when an access uses page address translation, the appropriate segment descriptor is required. in 32-bit implementations, one of the 16 on-chip segment registers (which contain segment descriptors) is selected by the four highest-order effective address bits. a control bit in the corresponding segment descriptor then determines if the access is to memory (memory-mapped) or to the direct-store interface space (selected when the direct- store translation control bit (t bit) in the corresponding segment descriptor is set). note that the direct-store interface is present only for compatibility with existing i/o devices that use this interface. when an access is determined to be to the direct-store interface space, the 603e takes a dsi exception as described in section 4.5.3, ?si exception (0x00300)?if it is a data access, and takes an isi exception as described in section 4.5.4, ?si exception (0x00400)?if it is an instruction access. for memory accesses translated by a segment descriptor, the interim virtual address is generated using the information in the segment descriptor. page address translation corresponds to the conversion of this virtual address into the 32-bit physical address used by the memory subsystem. in most cases, the physical address for the page resides in an on- chip tlb and is available for quick access. however, if the page address translation misses in an on-chip tlb, the mmu causes a search of the page tables in memory (using the virtual address information and a hashing function) to locate the required physical address. when this occurs, the 603e vectors to exception handlers that search the page tables with software. block address translation occurs in parallel with page address translation and is similar to page address translation; however, fewer higher-order effective address bits are translated into physical address bits (more lower-order address bits (at least 17) are untranslated to form the offset into a block). also, instead of segment descriptors and a tlb, block address translations use the on-chip bat registers as a bat array. if an effective address matches the corresponding ?ld of a bat register, the information in the bat register is used to generate the physical address; in this case, the results of the page translation (occurring in parallel) are ignored (even if the segment corresponds to the direct-store interface space).
motorola chapter 5. memory management 5-9 figure 5-4. address translation types real addressing mode translation occurs when address translation is disabled; in this case the physical address generated is identical to the effective address. instruction and data address translation is enabled with the msr[ir] and msr[dr] bits, respectively. thus when the processor generates an access, and the corresponding address translation enable bit in msr (msr[ir] for instruction accesses and msr[dr] for data accesses) is cleared, the resulting physical address is identical to the effective address and all other translation mechanisms are ignored. (t = 1) (t = 0) 031 effective address 051 virtual address segment descriptor located match with bat reg- isters 031 physical address 031 physical address 031 physical address look up in page table address translation disabled page address direct-store interface translation (msr[ir] = 0, or msr[dr] = 0) real addressing mode effective address = physical address (see section 5.2) block address translation (see section 5.3) dsi/isi exception
5-10 mpc603e & EC603E risc microprocessors user's manual motorola 5.1.4 memory protection facilities in addition to the translation of effective addresses to physical addresses, the mmus provide access protection of supervisor areas from user access and can designate areas of memory as read-only as well as no-execute or guarded. table 5-2 shows the eight protection options supported by the mmus for pages. the operating system programs whether instructions can be fetched from an area of memory by appropriately using the no-execute option provided in the segment descriptor. each of the remaining options is enforced based on a combination of information in the segment descriptor and the page table entry. thus, the supervisor-only option allows only read and write operations generated while the processor is operating in supervisor mode (corresponding to msr[pr] = 0) to access the page. user accesses that map into a supervisor-only page cause an exception to be taken. finally, there is a facility in the vea and oea that allows pages or blocks to be designated as guarded preventing out-of order accesses that may cause undesired side effects. for example, areas of the memory map that are used to control i/o devices can be marked as guarded so that accesses (for example, instruction prefetches) do not occur unless they are explicitly required by the program. for more information on memory protection, see ?emory protection facilities,?in chapter 7, ?emory management,?in the the programming environments manual . table 5-2. access protection options for pages option user read user write supervisor read supervisor write i-fetch data i-fetch data supervisor-only ?? ? supervisor-only-no-execute ?? supervisor-write-only ?? ?? ? supervisor-write-only-no-execute ? ?? both user/supervisor ????? ? both user/supervisor-no-execute ?? ?? both read-only ?? ?? both read-only-no-execute ? ? ? access permitted ?protection violation
motorola chapter 5. memory management 5-11 5.1.5 page history information the mmus of powerpc processors also de?e referenced (r) and changed (c) bits in the page address translation mechanism that can be used as history information relevant to the page. this information can then be used by the operating system to determine which areas of memory to write back to disk when new pages must be allocated in main memory. while these bits are initially programmed by the operating system into the page table, the architecture speci?s that the r and c bits may be maintained either by the processor hardware (automatically) or by some software-assist mechanism that updates these bits when required. the software table search routines used by the 603e set the r bit when a pte is accessed; the 603e causes an exception (to vector to the software table search routines) when the c bit in the corresponding tlb entry requires updating. 5.1.6 general flow of mmu address translation the following sections describe the general ?w used by powerpc processors to translate effective addresses to virtual and then physical addresses. 5.1.6.1 real addressing mode and block address translation selection when an instruction or data access is generated and the corresponding instruction or data translation is disabled (msr[ir] = 0 or msr[dr] = 0), real addressing mode translation is used (physical address equals effective address) and the access continues to the memory subsystem as described in section 5.2, ?eal addressing mode.? figure 5-5 shows the ?w used by the mmus in determining whether to select real addressing mode, block address translation or to use the segment descriptor to select page address translation.
5-12 mpc603e & EC603E risc microprocessors user's manual motorola figure 5-5. general flow of address translation (real addressing mode and block) note that if the bat array search results in a hit, the access is quali?d with the appropriate protection bits. if the access violates the protection mechanism, an exception (isi or dsi exception) is generated. 5.1.6.2 page address translation selection if address translation is enabled (real addressing mode not selected) and the effective address information does not match with a bat array entry, then the segment descriptor must be located. once the segment descriptor is located, the t bit in the segment descriptor selects whether the translation is to a page or to a direct-store interface segment as shown in figure 5-6. note that the 603e does not implement the direct-store interface, and accesses to these segments cause a dsi exception. in addition, figure 5-6 also shows the way in which the no-execute protection is enforced; if the n bit in the segment descriptor is set and the access is an instruction fetch, the access is faulted as described in chapter 7, ?emory management,?in the programming environments manual . note that the ?ure shows the ?w for these cases as described by the powerpc oea, and so the tlb references are shown as optional. since the 603e implements tlbs, these branches are valid, and described in more detail throughout this chapter. perform address translation with segment descriptor access faulted compare address with instruction or data bat array (as appropriate) translate address perform real addressing mode translation effective address generated continue access to memory subsystem instruction translation enabled (msr[ir] =1) data translation enabled (msr[dr] = 1) (see figure 5-6) instruction translation disabled (msr[ir] = 0) data translation disabled (msr[dr] = 0) bat array hit bat array miss d-access i-access access protected access permitted perform real addressing mode translation (see the programming environments manua l)
motorola chapter 5. memory management 5-13 figure 5-6. general flow of page and direct-store interface address translation access faulted access faulted perform page table search operation continue access to memory subsystem translate address *in the case of instruction accesses, causes isi exception load tlb entry tlb miss address translation with segment descriptor (see figure 5-8) (see figure 5-9) tlb hit otherwise check t bit in segment descriptor use ea0?a3 to select one of 16 on-chip segment registers page address translation (t = 0) direct-store segment address (t = 1)* i-fetch with n-bit set in segment descriptor (no-execute) pte not found pte found access protected access permitted optional to the powerpc architecture. implemented in the 603e. dsi/isi exception compare virtual address with tlb entries generate 52-bit virtual address from segment descriptor
5-14 mpc603e & EC603E risc microprocessors user's manual motorola if the t bit in the corresponding segment descriptor is zero, page address translation is selected. the information in the segment descriptor is then used to generate the 52-bit virtual address. the virtual address is then used to identify the page address translation information (stored as page table entries (ptes) in a page table in memory). for increased performance, the 603e has two tlbs to store recently-used ptes on-chip. if an access hits in the appropriate tlb, the page translation occurs and the physical address bits are forwarded to the memory subsystem. if the required pte is not resident, the mmu requires a search of the page table. in this case, the 603e traps to one of three exception handlers for the system software to perform the page table search. if the pte is successfully matched, a new tlb entry is created and the page translation is once again attempted. this time, the tlb is guaranteed to hit. once the pte is located, the access is quali?d with the appropriate protection bits. if the access is a protection violation (not allowed), an exception (instruction access or data access) is generated. if the pte is not found by the table search operation, a page fault condition exists, and the tlb miss exception handlers synthesize either an isi or dsi exception to handle the page fault. 5.1.7 mmu exceptions summary in order to complete any memory access, the effective address must be translated to a physical address. in the 603e, an mmu exception condition occurs if this translation fails for one of the following reasons: page fault?here is no valid entry in the page table for the page speci?d by the effective address (and segment descriptor) and there is no valid bat translation. an address translation is found but the access is not allowed by the memory protection mechanism. additionally, because the 603e relies on software to perform table search operations, the processor also takes an exception when: there is a miss in the corresponding (instruction or data) tlb. the page table requires an update to the changed (c) bit. the state saved by the processor for each of these exceptions contains information that identi?s the address of the failing instruction. refer to chapter 4, ?xceptions,?for a more detailed description of exception processing.
motorola chapter 5. memory management 5-15 because a page fault condition (pte not found in the page tables in memory) is detected by the software that performs the table search operation (and not the 603e hardware), it does not cause 603e exception in the strictest sense in that exception processing as described in chapter 4, ?xceptions,?does not occur. however, in order to maintain architectural compatibility with software written for other powerpc devices, the software that detects this condition should synthesize an exception by setting the appropriate bits in the dsisr or srr1 and branching to the isi or dsi exception handler. refer to section 5.5.2, ?mplementation-speci? table search operation,?for more information and examples of this exception software. the remainder of this chapter assumes that the table search software emulates this exception and refers to this condition as an exception. the translation exception conditions de?ed by the oea for 32-bit implementations cause either the isi or the dsi exception to be taken as shown in table 5-3. table 5-3. translation exception conditions condition description exception page fault (no pte found) no matching pte found in page tables (and no matching bat array entry) i access: isi exception* srr1[1] = 1 d access: dsi exception* dsisr[1] =1 block protection violation conditions described for block in ?lock memory protection in chapter 7, ?emory management, in the programming environments manual. i access: isi exception srr1[4] = 1 d access: dsi exception dsisr[4] =1 page protection violation conditions described for page in ?age memory protection in chapter 7, ?emory management, in the programming environments manual. i access: isi exception** srr1[4] = 1 d access: dsi exception** dsisr[4] =1 no-execute protection violation attempt to fetch instruction when sr[n] = 1 isi exception srr1[3] = 1 instruction fetch from direct-store segment attempt to fetch instruction when sr[t] = 1 isi exception srr1[3] =1 data access to direct-store segment (including ?ating-point accesses) note : this is a 603e-speci? condition attempt to perform load or store (including ?ating-point load or store***) when sr[t] = 1 dsi exception dsisr[5] =1 instruction fetch from guarded memory with msr[ir] = 1 attempt to fetch instruction when msr[ir] = 1 and either matching xbat[g] = 1, or no matching bat entry and pte[g] = 1 isi exception srr1[3] =1 * the 603e hardware does not vector to these exceptions automatically. it is assumed that the software that performs the table search operations vectors to these exceptions and sets the appropriate bits when a page fault condition occurs. **the table search software can also vector to these exception conditions ***the EC603E microprocessor does not support the ?ating-point unit.
5-16 mpc603e & EC603E risc microprocessors user's manual motorola in addition to the translation exceptions, there are other mmu-related conditions (some of them de?ed as implementation-speci? and therefore, not required by the architecture) that can cause an exception to occur in the 603e. these exception conditions map to the processor exception as shown in table 5-4. for example, the 603e also de?es three exception conditions to support software table searching. the only exception conditions that occur when msr[dr] = 0 are the conditions that cause the alignment exception for data accesses. for more detailed information about the conditions that cause the alignment exception (in particular for string/multiple instructions), see section 4.5.6, alignment exception (0x00600). note that some exception conditions depend upon whether the memory area is set up as write-though (w = 1) or cache-inhibited (i = 1). these bits are described fully in ?emory/cache access attributes,?in chapter 5, ?ache model and memory coherency, of the programming environments manual. refer to chapter 4, ?xceptions,?and to chapter 6, ?xceptions,?in the programming environments manual for a complete description of the srr1 and dsisr bit settings for these exceptions. table 5-4. other mmu exception conditions condition description exception tlb miss for an instruction fetch no matching entry found in itlb instruction tlb miss exception srr1[13] = 1 msr[14] = 1 tlb miss for a data access no matching entry found in dtlb for data access load: data tlb miss on load exception msr[14] = 1 store: data tlb miss on store exception srr1[15] =1 msr[14] = 1 store operation and c = 0 matching dltb entry has c = 0 and access is a store data tlb miss on store exception srr1[15] =1 msr[14] = 1 dcbz with w = 1 or i = 1 dcbz instruction to write-through or cache-inhibited segment or block alignment exception (not required by architecture for this condition) dcbz when the data cache is locked the dcbz instruction takes an alignment exception if the data cache is locked (hid0 bits 18 and 19) when it is executed. alignment exception lwarx or stwcx. with w = 1 reservation instruction to write- through segment or block dsi exception dsisr[5] = 1 lwarx , stwcx. , eciwx , or ecowx instruction to direct-store segment reservation instruction or external control instruction when sr[t] =1 dsi exception dsisr[5] = 1 floating-point load or store to direct-store segment* fp memory access when sr[t] = 1 see data access to direct-store segment in table 5-3.
motorola chapter 5. memory management 5-17 5.1.8 mmu instructions and register summary the mmu instructions and registers provide the operating system with the ability to set up the block address translation areas and the page tables in memory. note that because the implementation of tlbs is optional, the instructions that refer to these structures are also optional. however, because these structures serve as caches of the page table, the architecture speci?s a software protocol for maintaining coherency between these caches and the tables in memory whenever changes are made to the tables in memory. when the tables in memory are changed, the operating system purges these caches of the corresponding entries, allowing the translation caching mechanism to refetch from the tables when the corresponding entries are required. note that the 603e implements all tlb-related instructions except tlbia , which is treated as an illegal instruction. the 603e also uses some implementation-speci? instructions to load two on-chip tlbs. because the mmu speci?ation for powerpc processors is so ?xible, it is recommended that the software that uses these instructions and registers be ?ncapsulated?into subroutines to minimize the impact of migrating across the family of implementations. table 5-5 summarizes 603e instructions that speci?ally control the mmu. for more detailed information about the instructions, refer to chapter 2, ?rogramming model,?in this book and chapter 8, ?nstruction set,?in the programming environments manual. load or store that results in a direct-store error does not occur in 603e does not apply eciwx or ecowx attempted when external control facility disabled eciwx or ecowx attempted with ear[e] = 0 dsi exception dsisr[11] = 1 lmw , stmw , lswi , lswx , stswi , or stswx instruction attempted in little-endian mode lmw , stmw , lswi , lswx , stswi , or stswx instruction attempted while msr[le] = 1 alignment exception operand misalignment translation enabled and operand is misaligned as described in chapter 4, ?xceptions. alignment exception (some of these cases are implementation-speci?) *the EC603E microprocessor does not support the ?ating-point unit. table 5-4. other mmu exception conditions (continued) condition description exception
5-18 mpc603e & EC603E risc microprocessors user's manual motorola table 5-6 summarizes the registers that the operating system uses to program the 603e mmus. these registers are accessible to supervisor-level software only. these registers are described in chapter 2, ?egister set,?in the programming environments manual. for 603e-speci? registers, see chapter 2, ?rogramming model,?of this book. table 5-5. instruction summary?mu control instruction description mtsr sr ,r s move to segment register sr[sr#] ? r s mtsrin r s ,r b move to segment register indirect sr[ r b[0?]] ? r s mfsr r d , sr move from segment register r d ? sr[sr#] mfsrin r d ,r b move from segment register indirect r d ? sr[ r b[0?]] tlbie r b* tlb invalidate entry for effective address speci?d by r b, tlb[v] ? 0 the tlbie instruction invalidates both tlb entries indexed by the ea, and operates on both the instruction and data tlbs simultaneously invalidating four tlb entries. the index corresponds to bits 15?9 of the ea. tlbsync * tlb synchronize synchronizes the execution of all other tlbie instructions in the system. in the 603e, when the tlbisync signal is negated, instruction execution may continue or resume after the completion of a tlbsync instruction. when the tlbisync signal is asserted, instruction execution stops after the completion of a tlbsync instruction. tlbli (603e-speci?) load instruction tlb entry loads the contents of the icmp and rpa registers into the itlb tlbld (603e-speci?) load data tlb entry loads the contents of the dcmp and rpa registers into the dtlb *these instructions are de?ed by the powerpc architecture, but are optional. table 5-6. mmu registers register description segment registers (sr0?r15) the sixteen 32-bit segment registers are present only in 32-bit implementations of the powerpc architecture. the ?lds in the segment register are interpreted differently depending on the value of bit 0. the segment registers are accessed by the mtsr , mtsrin , mfsr , and mfsrin instructions. bat registers (ibat0u?bat3u, ibat0l?bat3l, dbat0u?bat3u, and dbat0l?bat3l) there are 16 bat registers, organized as four pairs of instruction bat registers (ibat0u?bat3u paired with ibat0l?bat3l) and four pairs of data bat registers (dbat0u?bat3u paired with dbat0l?bat3l). the bat registers are de?ed as 32-bit registers in 32-bit implementations. these are special-purpose registers that are accessed by the mtspr and mfspr instructions. sdr1 the sdr1 register speci?s the variable used in accessing the page tables in memory. sdr1 is de?ed as a 32-bit register for 32-bit implementations. this is a special-purpose register that is accessed by the mtspr and mfspr instructions.
motorola chapter 5. memory management 5-19 note that the 603e contains other features that do not speci?ally control the 603e mmu but are implemented to increase performance and ?xibility. these are: complete set of shadow segment registers for the instruction mmu. these registers are invisible to the programming model, as described in section 5.4.3, ?lb description. temporary gpr0?pr3. these registers are available as r0 r3 when msr[tgpr] is set. the 603e automatically sets msr[tgpr] whenever one of the three tlb miss exceptions occurs, allowing these exception handlers to have four registers that are used as scratchpad space, without having to save or restore this part of the machine state that existed when the exception occurred. note that msr[tgpr] is restored to the value in srr1 when the r instruction is executed. refer to section 5.5.2, ?mplementation-speci? table search operation,?for code examples that take advantage of these registers. in addition, the 603e also automatically saves the values of cr[cr0] of the executing context to srr1[0?] whenever one of the three tlb miss exceptions occurs. thus, the exception handler can set cr[cr0] bits and branch accordingly in the exception handler routine, without having to save the existing cr[cr0] bits. however, the exception handler must restore these bits to cr[cr0] before executing the r instruction. there are also four other bits saved in srr1 whenever a tlb miss exception occurs that give information about whether the access was an instruction or data access; and if it was a data access, whether it was for a load or a store instruction. also these bits give some information related to the protection attributes for the access, and which set in the tlb will be replaced when instruction tlb miss address and data tlb miss address registers (imiss and dmiss) when a tlb miss exception occurs, the imiss or dmiss register contains the 32-bit effective address of the instruction or data access, respectively, that caused the miss. note that the 603e always loads a big-endian address into the dmiss register. these registers are 603e-speci?. primary and secondary hash address registers (hash1 and hash2) the hash1 and hash2 registers contain the primary and secondary pteg addresses that correspond to the address causing a tlb miss. these pteg addresses are automatically derived by the 603e by performing the primary and secondary hashing function on the contents of imiss or dmiss, for an itlb or dtlb miss exception, respectively. these registers are 603e-speci?. instruction and data pte compare registers (icmp and dcmp) the icmp and dcmp registers contain the word to be compared with the ?st word of a pte in the table search software routine to determine if a pte contains the address translation for the instruction or data access. the contents of icmp and dcmp are automatically derived by the 603e when a tlb miss exception occurs. these registers are 603e-speci?. required physical address register (rpa) the system software loads a tlb entry by loading the second word of the matching pte entry into the rpa register and then executing the tlbli or tlbld instruction (for loading the itlb or dtlb, respectively). this register is 603e-speci?. table 5-6. mmu registers (continued) register description
5-20 mpc603e & EC603E risc microprocessors user's manual motorola the next tlb entry is loaded. refer to section 5.5.2.1, ?esources for table search operations,?for more information on these bits and their use. 5.2 real addressing mode if address translation is disabled (msr[ir] = 0 or msr[dr] = 0) for a particular access, the effective address is treated as the physical address and is passed directly to the memory subsystem as described in chapter 7, ?emory management,?in the programming environments manual . note that the default wimg bits (0b0011) cause data accesses to be considered cacheable (i = 0) and thus load and store accesses are weakly ordered. this is the case, even if the data cache is disabled in the hid0 register (as it is out of hard reset). if i/o devices require load and store accesses to occur in strict program order (strongly ordered), translation must be enabled so that the corresponding i bit can be set. also, for instruction accesses, the default memory access mode bits (wimg) are 0b0001. that is, instruction accesses are considered cacheable (i = 0), and the memory is guarded. again, instruction cache accesses are considered cacheable even if the instruction cache is disabled in the hid0 register (as it is out of hard reset). the w and m bits have no effect on the instruction cache. for information on the synchronization requirements for changes to msr[ir] and msr[dr], refer to ?ynchronization requirements for special registers and for lookaside buffers?in chapter 2, ?owerpc register set,?in the programming environments manual. 5.3 block address translation the block address translation (bat) mechanism in the oea provides a way to map ranges of effective addresses larger than a single page into contiguous areas of physical memory. such areas can be used for data that is not subject to normal virtual memory handling (paging), such as a memory-mapped display buffer or an extremely large array of numerical data. the software model for block address translation in the 603e is described in chapter 7, ?emory management,?in the programming environments manual for 32-bit implementations. implementation note the 603e bat registers are not initialized by the hardware after the power-up or reset sequence. consequently, all valid bits in both instruction and data bat areas must be cleared before setting any bat area for the ?st time. this is true regardless of whether address translation is enabled. also, software must avoid overlapping blocks while updating a bat area or areas. even if translation is disabled, multiple bat area hits are treated as programming errors and can corrupt the bat registers and produce unpredictable results.
motorola chapter 5. memory management 5-21 5.4 memory segment model the 603e adheres to the memory segment model as de?ed in chapter 7, ?emory management,?in the programming environments manual for 32-bit implementations. memory in the powerpc oea is divided into 256-mbyte segments. this segmented memory model provides a way to map 4-kbyte pages of effective addresses to 4-kbyte pages in physical memory (page address translation), while providing the programming ?xibility afforded by a large virtual address space (52 bits). the segment/page address translation mechanism may be superseded by the block address translation (bat) mechanism described in section 5.3, ?lock address translation.?if not, the translation proceeds in the following two steps: 1. from effective address to the virtual address (which never exists as a speci? entity but can be considered to be the concatenation of the virtual page number and the byte offset within a page), and 2. from virtual address to physical address. this section highlights those areas of the memory segment model de?ed by the oea that are speci? to the 603e. 5.4.1 page history recording referenced (r) and changed (c) bits reside in each pte to keep history information about the page. they are maintained by a combination of the 603e hardware and the table search software. the operating system uses this information to determine which areas of memory to write back to disk when new pages must be allocated in main memory. referenced and changed recording is performed only for accesses made with page address translation and not for translations made with the bat mechanism or for accesses that correspond to direct- store interface (t = 1) segments. furthermore, r and c bits are maintained only for accesses made while address translation is enabled (msr[ir] = 1 or msr[dr] = 1). in the 603e, the referenced and changed bits are updated as follows: for tlb hits, the c bit is updated according to table 5-7. for tlb misses, when a table search operation is in progress to locate a pte, the r and c bits are updated (set, if required) to re?ct the status of the page based on this access.
5-22 mpc603e & EC603E risc microprocessors user's manual motorola table 5-7 shows that the status of the c bit in the tlb entry (in the case of a tlb hit) is what causes the processor to update the c bit in the pte (the r bit is assumed to be set in the page tables if there is a tlb hit). therefore, when software clears the r and c bits in the page tables in memory, it must invalidate the tlb entries associated with the pages whose referenced and changed bits were cleared. the 603e causes the r bit to be set for the execution of the dcbt or dcbtst instruction to that page (by causing a tlb miss exception to load the tlb entry in the case of a tlb miss). however, neither of these instructions cause the c bit to be set. the update of the referenced and changed bits is performed by powerpc processors as if address translation were disabled (real addressing mode translation). additionally, these updates should be performed with single-beat read and byte write transactions on the bus. 5.4.1.1 referenced bit the referenced (r) bit of a page is located in the pte in the page table. every time a page is referenced (with a read or write access) and the r bit is zero, the r bit is then set in the page table. the oea speci?s that the referenced bit may be set immediately, or the setting may be delayed until the memory access is determined to be successful. because the reference to a page is what causes a pte to be loaded into the tlb, the referenced bit in all 603e tlb entries is effectively always set. the processor never automatically clears the referenced bit. the referenced bit is only a hint to the operating system about the activity of a page. at times, the referenced bit may be set although the access was not logically required by the program or even if the access was prevented by memory protection. examples of this in powerpc systems include the following: fetching of instructions not subsequently executed accesses generated by an lswx or stswx instruction with a zero length accesses generated by a stwcx. instruction when no store is performed because a reservation does not exist accesses that cause exceptions and are not completed table 5-7. table search operations to update history bits?lb hit case r and c bits in tlb entry processor action 00 combination doesn? occur 01 combination doesn? occur 10 read: write: no special action table search operation required to update c. causes a data tlb miss on store exception 11 no special action for read or write
motorola chapter 5. memory management 5-23 5.4.1.2 changed bit the changed bit of a page is located both in the pte in the page table and in the copy of the pte loaded into the tlb (if a tlb is implemented, as in the 603e). whenever a data store instruction is executed successfully, if the tlb search (for page address translation) results in a hit, the changed bit in the matching tlb entry is checked. if it is already set, the processor does not change the c bit. if the tlb changed bit is 0, it is set and a table search operation is performed to also set the c bit in the corresponding pte in the page table. the 603e causes a data tlb miss on store exception for this case so that the software can perform the table search operation for setting the c bit. refer to section 5.5.2, ?mplementation-speci? table search operation,?for an example code sequence that handles these conditions. the changed bit (in both the tlb and the pte in the page tables) is set only when a store operation is allowed by the page memory protection mechanism and all conditional branches occurring earlier in the program have been resolved (such that the store is guaranteed to be in the execution path). furthermore, the following conditions may cause the c bit to be set: the execution of an stwcx. instruction is allowed by the memory protection mechanism but a store operation is not performed because no reservation exists. the execution of an stswx instruction is allowed by the memory protection mechanism but a store operation is not performed because the speci?d length is zero. the store operation is not performed because an exception occurs before the store is performed. again, note that although the execution of the dcbt and dcbtst instructions may cause the r bit to be set, they never cause the c bit to be set. 5.4.1.3 scenarios for referenced and changed bit recording this section provides a summary of the model (de?ed by the oea) that is used by powerpc processors for maintaining the referenced and changed bits. in some scenarios, the bits are guaranteed to be set by the processor, in some scenarios, the architecture allows that the bits may be set (not absolutely required), and in some scenarios, the bits are guaranteed to not be set. in implementations that do not maintain the r and c bits in hardware (such as the 603e), software assistance is required. for these processors, the information in this section still applies, except that the software performing the updates is constrained to the rules described (that is, must set bits shown as guaranteed to be set and must not set bits shown as guaranteed to not be set).
5-24 mpc603e & EC603E risc microprocessors user's manual motorola table 5-8 de?es a prioritized list of the r and c bit settings for all scenarios. the entries in the table are prioritized from top to bottom, such that a matching scenario occurring closer to the top of the table takes precedence over a matching scenario closer to the bottom of the table. for example, if an stwcx. instruction causes a protection violation and there is no reservation, the c bit is not altered, as shown for the protection violation case. note that in the table, load operations include those generated by load instructions, by the eciwx instruction, and by the cache management instructions that are treated as a load with respect to address translation. similarly, store operations include those operations generated by store instructions, by the ecowx instruction, and by the cache management instructions that are treated as a store with respect to address translation. in the columns for the 603e, the combination of the 603e itself and the software used to search the page tables (described in section 5.5.2, ?mplementation-speci? table search operation? is assumed. table 5-8. model for guaranteed r and c bit settings priority scenario r bit set c bit set oea 603e oea 603e 1 no-execute protection violation no no no no 2 page protection violation maybe yes no no 3 out-of-order instruction fetch or load operation maybe no no no 4 out-of-order store operation for instructions that will cause no other kind of precise exception (in the absence of system-caused, imprecise, or ?ating- point assist exceptions) 1 maybe 2 no no no 5 all other out-of-order store operations maybe 2 no maybe 2 no 6 zero-length load ( lswx ) maybe yes no no 7 zero-length store ( stswx ) maybe 2 yes maybe 2 ye s 8 store conditional ( stwcx. ) that does not store maybe 2 yes maybe 2 ye s 9 in-order instruction fetch yes 3 ye s n o n o 10 load instruction or eciwx yes yes no no 11 store instruction, ecowx or dcbz instruction yes yes yes yes 12 dcbt , dcbtst , dcbst , or dcbf instruction maybe yes no no 13 icbi instruction maybe 2 no no 2 no 14 dcbi instruction maybe 2 yes maybe 2 ye s 1 the EC603E microprocessor does not support the ?ating-point unit. 2 if c is set, r is guaranteed to also be set 3 this includes the case in which the instruction was fetched out-of-order and r was not set (does not apply for 603e).
motorola chapter 5. memory management 5-25 for more information, see ?age history recording?in chapter 7, ?emory management,?of the programming environments manual . 5.4.2 page memory protection the 603e implements page memory protection as it is de?ed in chapter 7, ?emory management,?in the programming environments manual . 5.4.3 tlb description this section describes the hardware resources provided in the 603e to facilitate the page address translation process. note that the hardware implementation of the mmu is not speci?d by the architecture, and while this description applies to the 603e, it does not necessarily apply to other powerpc processors. 5.4.3.1 tlb organization because the 603e has two mmus (immu and dmmu) that operate in parallel, some of the mmu resources are shared, and some are actually duplicated (shadowed) in each mmu to maximize performance. figure 5-7 shows the relationships between these resources within both the immu and dmmu and how the various portions of the effective address are used in the address translation process.
5-26 mpc603e & EC603E risc microprocessors user's manual motorola figure 5-7. segment register and tlb organization while both mmus can be accessed simultaneously (both sets of segment registers and tlbs can be accessed in the same clock), when there is an exception condition, only one exception is reported at a time. itlb miss exceptions are reported when there are no more instructions to be dispatched or retired (the pipeline is empty), and dtlb miss conditions are reported when the load or store instruction is ready to be retired. refer to chapter 6, ?nstruction timing,?for more detailed information about the internal pipelines and the reporting of exceptions. as tlb entries are on-chip copies of ptes in the page tables in memory, they are similar in structure. tlb entries consist of two words; the high-order word contains the vsid and api ?lds of the high-order word of the pte and the low-order word contains the rpn, the c bit, the wimg bits and the pp bits (as in the low-order word of the pte). in order to 078 31 0 15 segment registers tlb ea0?a31 ea0?a3 ea15?a19 vsid compare compare ea4?a14 line 1 line 0 mux rpn line1/line 0 hit pa0?a19 t t vsid v 0 31 v select
motorola chapter 5. memory management 5-27 uniquely identify a tlb entry as the required pte, the tlb also contains ?e more bits of the page index, ea10?a14 (in addition to the api bits of the pte). when an instruction or data access occurs, the effective address is routed to the appropriate mmu. ea0?a3 select one of the 16 segment registers and the remaining effective address bits and the virtual address from the segment register is passed to the tlb. ea15?a19 then select two entries in the tlb; the valid bit is checked and ea10?a14, the vsid, and api ?lds (ea4?a9) for the access are then compared with the corresponding values in the tlb entries. if one of the entries hits, the pp bits are checked for a protection violation, and the c bit is checked. if these bits do not cause an exception, the rpn value is passed to the memory subsystem and the wimg bits are then used as attributes for the access. although address translation is disabled on a reset condition, the valid bits of the bat array and tlb entries are not automatically cleared. thus tlb entries must be explicitly cleared by the system software (with the tlbie instruction) before the valid entries are loaded and address translation is enabled. also, note that the segment registers do not have a valid bit, and so they should also be initialized before translation is enabled. 5.4.3.2 tlb entry invalidation for the powerpc processors, such as the 603e, that implement tlb structures to maintain on-chip copies of the ptes that are resident in physical memory, the optional tlbie instruction provides a way to invalidate the tlb entries. note that the execution of the tlbie instruction in the 603e invalidates four entries?oth the itlb entries indexed by ea15?a19 and both the indexed entries of the dtlb. the architecture allows tlbie to optionally enable a tlb invalidate signaling mechanism in hardware so that other processors also invalidate their resident copies of the matching pte. the 603e does not signal the tlb invalidation to other processors nor does it perform any action when a tlb invalidation is performed by another processor. the tlbsync instruction causes instruction execution to stop if the tlbisync signal is also asserted. if tlbisync is negated, instruction execution may continue or resume after the completion of a tlbsync instruction. section 8.8.2, ?lbisync input,?describes the tlb synchronization mechanism in further detail. the tlbia instruction is not implemented on the 603e and when its opcode is encountered, an illegal instruction program exception is generated. to invalidate all entries of both tlbs, 32 tlbie instructions must be executed, incrementing the value in ea15?a19 by one each time. see chapter 8, ?nstruction set,?in the programming environments manual for detailed information about the tlbie instruction.
5-28 mpc603e & EC603E risc microprocessors user's manual motorola 5.4.4 page address translation summary figure 5-8 provides the detailed ?w for the page address translation mechanism. the ?ure includes the checking of the n bit in the segment descriptor and then expands on the ?lb hit?branch of figure 5-6. the detailed ?w for the ?lb miss?branch of figure 5-6 is described in section 5.5.1, ?age table search operation?onceptual flow.?note that as in the case of block address translation, if the dcbz instruction is attempted to be executed either in write-through mode or as cache-inhibited (w = 1 or i = 1), the alignment exception is generated. the checking of memory protection violation conditions for page address translation is described in chapter 7, ?emory management,?in the programming environments manual for 32-bit implementations.
motorola chapter 5. memory management 5-29 figure 5-8. page address translation flow for 32-bit implementations?lb hit (see the programming environments manual ) (see figure 5-9) tlb hit case alignment exception effective address generated compare virtual address with tlb entries continue access to mem- ory subsystem with wimg bits from pte page table search operation pa0?a31 ? rpn||a20?31 generate 52-bit virtual address from segment descriptor page address translation check page memory protection violation conditions i-fetch with n-bit set in segment descriptor (no-execute) page memory protection violation access prohibited access permitted otherwise store access with pte [c] = 0 otherwise dcbz instruction with w or i = 1 otherwise (see the programming environments manual )
5-30 mpc603e & EC603E risc microprocessors user's manual motorola 5.5 page table search operation as stated earlier, the operating system must synthesize the table search algorithm for setting up the tables. in the case of the 603e, the tlb miss exception handlers also use this algorithm (with the assistance of some hardware-generated values) to load tlb entries when tlb misses occur as described in section 5.5.2, ?mplementation-speci? table search operation.? 5.5.1 page table search operation?onceptual flow the table search process for a powerpc processor varies slightly for 64- and 32-bit implementations. the main differences are the address ranges and pte formats speci?d. an outline of the page table search process performed by a 32-bit implementation (such as the 603e) is as follows: 1. the 32-bit physical address of the primary pteg is generated as described in chapter 7, ?emory management,?in the programming environments manual for 32-bit implementations. 2. the ?st pte (pte0) in the primary pteg is read from memory. pte reads should occur with an implied wim memory/cache mode control bit setting of 0b001. therefore, they are considered cacheable and burst in from memory and placed in the cache. 3. the pte in the selected pteg is tested for a match with the virtual page number (vpn) of the access. the vpn is the vsid concatenated with the page index ?ld of the virtual address. for a match to occur, the following must be true: pte[h] = 0 pte[v] = 1 pte[vsid] = va[0?3] pte[api] = va[24?9] 4. if a match is not found, step 3 is repeated for each of the other seven ptes in the primary pteg. if a match is found, the table search process continues as described in step 8. if a match is not found within the eight ptes of the primary pteg, the address of the secondary pteg is generated. 5. the ?st pte (pte0) in the secondary pteg is read from memory. again, because pte reads typically have a wim bit combination of 0b001, an entire cache line is burst into the on-chip cache. 6. the pte in the selected secondary pteg is tested for a match with the virtual page number (vpn) of the access. for a match to occur, the following must be true: pte[h] = 1 pte[v] = 1 pte[vsid] = va[0?3] pte[api] = va[24?9]
motorola chapter 5. memory management 5-31 7. if a match is not found, step 6 is repeated for each of the other seven ptes in the secondary pteg. 8. if a match is found, the pte is written into the on-chip tlb (if implemented, as in the 603e) and the r bit is updated in the pte in memory (if necessary). if there is no memory protection violation, the c bit is also updated in memory and the table search is complete. 9. if a match is not found within the eight ptes of the secondary pteg, the search fails, and a page fault exception condition occurs (either an isi exception or a dsi exception). note that the software routines that implement this algorithm for the 603e must synthesize this condition by appropriately setting the bits in srr1 (or dsisr) and branching to the isi or dsi handler routine. reads from memory for table search operations should be performed as global (but not exclusive), cacheable operations, and can be loaded into the on-chip cache. figure 5-9 and figure 5-10 provide conceptual ?w diagrams of primary and secondary page table search operations, respectively as described in the oea for 32-bit processors. recall that the architecture allows for implementations to perform the page table search operations automatically (in hardware) or software assist may be required, as is the case with the 603e. also, the elements in the figure that apply to tlbs are shown as optional because tlbs are not required by the architecture. figure 5-9 shows the case of a dcbz instruction that is executed with w = 1 or i = 1, and that the r bit may be updated in memory (if required) before the operation is performed or the alignment exception occurs. the r bit may also be updated in the case of a memory protection violation.
5-32 mpc603e & EC603E risc microprocessors user's manual motorola figure 5-9. primary page table search?onceptual flow (from figure 5-10) fetch pte from pteg otherwise perform secondary page table search secondary page table search hit pte[r] ? 1 r_flag ? 1 write pte into tlb otherwise dcbz instruction with w or i = 1 otherwise perform operation to memory or take alignment exception page table search complete tlb[pte[c]] ? 1 page table search complete r_flag = 1 byte write to update pte[r] in memory pte[r] ? 1 (update pte[r] in memory) generate pa using primary hash function pa ? base pa of pteg primary page table search pa ? pa+ 8 (fetch next pte in pteg) fetch pte (64 bits) from pa pte [vsid, api, h, v] = segment descriptor [vsid], ea[api], 0, 1 memory protection violation pte[c] ? 1 (update pte[c] in memory) pte[r] ? 1 (update pte[r] in memory) otherwise access permitted access prohibited check memory protection violation conditions optional otherwise last pte in pteg pte[r] = 1 pte[r] = 0 otherwise r_flag = 1 store operation with pte[c] = 0 otherwise r_flag = 1
motorola chapter 5. memory management 5-33 figure 5-10. secondary page table search flow?onceptual flow 5.5.2 implementation-speci? table search operation the 603e has a set of implementation-speci? registers, exceptions, and instructions that facilitate very ef?ient software searching of the page tables in memory. this section describes those resources and provides three example code sequences that can be used in a 603e system for an ef?ient search of the translation tables in software. these three code sequences can be used as handlers for the three exceptions requiring access to the ptes in the page tables in memory?nstruction tlb miss, data tlb miss on load, and data tlb miss on store exceptions. generate pa using secondary hash function pa ? base pa of pteg fetch pte from pteg fetch pte (64 bits) from pa pa ? pa+ 8 (fetch next pte in pteg) pte [vsid, api, h, v] = segment descriptor [vsid], ea[api], 1, 1 secondary page table search hit page fault dsi exception isi exception set srr1[1]=1 set dsisr[1]=1 (see figure 5-9) secondary page table search otherwise otherwise last pte in pteg data access instruction access
5-34 mpc603e & EC603E risc microprocessors user's manual motorola 5.5.2.1 resources for table search operations in addition to setting up the translation page tables in memory, the system software must assist the processor in loading ptes into the on-chip tlbs. when a required tlb entry is not found in the appropriate tlb, the processor vectors to one of the three tlb miss exception handlers so that the software can perform a table search operation and load the tlb. when this occurs, the processor automatically saves information about the access and the executing context. table 5-9 provides a summary of the implementation-speci? exceptions, registers, and instructions, that can be used by the tlb miss exception handler software in 603e systems. refer to chapter 4, ?xceptions,?for more information about exception processing. table 5-9. implementation-specific resources for table search operations resource name description exceptions instruction tlb miss exception (vector offset 0x1000) no matching entry found in itlb data tlb miss on load exception (vector offset 0x1100) no matching entry found in dtlb for a load data access data tlb miss on store exception?lso caused when changed bit must be updated (vector offset 0x1200) no matching entry found in dtlb for a store data access or matching dltb entry has c = 0 and access is a store. registers imiss and dmiss when a tlb miss exception occurs, the imiss or dmiss register contains the 32-bit effective address of the instruction or data access that caused the miss exception. icmp and dcmp the icmp and dcmp registers contain the word to be compared with the ?st word of a pte in the table search software routine to determine if a pte contains the address translation for the instruction or data access. the contents of icmp and dcmp are automatically derived by the 603e when a tlb miss exception occurs. hash1 and hash2 the hash1 and hash2 registers contain the primary and secondary pteg addresses that correspond to the address causing a tlb miss. these pteg addresses are automatically derived by the 603e by performing the primary and secondary hashing function on the contents of imiss or dmiss, for an itlb or dtlb miss exception, respectively rpa the system software loads a tlb entry by loading the second word of the matching pte entry into the rpa register and then executing the tlbli or tlbld instruction (for loading the itlb or dtlb, respectively).
motorola chapter 5. memory management 5-35 in addition, the 603e contains the following features that do not specifically control the 603e mmu but that are implemented to increase performance and flexibility in the software table search routines whenever one of the three tlb miss exceptions occurs: temporary gpr0?pr3. these registers are available as r0 r3 when msr[tgpr] is set. the 603e automatically sets msr[tgpr] for these cases, allowing these exception handlers to have four registers that are used as scratchpad space, without having to save or restore this part of the machine state that existed when the exception occurred. note that msr[tgpr] is cleared when the r instruction is executed because the old msr value (with msr[tgpr] = 0) saved in srr1 is restored. refer to section 5.5.2.2, ?oftware table search operation,?for code examples that take advantage of these registers. the 603e also automatically saves the values of cr[cr0] of the executing context to srr1[0?]. thus, the exception handler can set cr[cr0] bits and branch accordingly in the exception handler routine, without having to save the existing cr[cr0] bits. however, the exception handler must restore these bits to cr[cr0] before executing the r instruction. also saved in srr1 are two bits identifying the type of miss (srr1[d/i] identi?s instruction or data, and srr1[l/s] identi?s a load or store). additionally, srr1[way] identi?s the associativity class of the tlb entry selected for replacement by the lru algorithm. the software can change this value, effectively overriding the replacement algorithm. finally, the srr1[key] bit is used by the table search software to determine if there is a protection violation associated with the access (useful on data write misses for determining if the c bit should be updated in the table). table 5-10 summarizes the srr1 bits updated whenever one of the three tlb miss exceptions occurs. instructions tlbli r b loads the contents of the icmp and rpa registers into the itlb entry selected by and srr1[way] tlbld r b loads the contents of the dcmp and rpa registers into the dtlb entry selected by and srr1[way] table 5-9. implementation-specific resources for table search operations resource name description
5-36 mpc603e & EC603E risc microprocessors user's manual motorola the key bit saved in srr1 is derived as shown in figure 5-11. figure 5-11. derivation of key bit for srr1 the remainder of this section describes the format of the implementation-speci? sprs that are not de?ed by the powerpc architecture, but are used by the tlb miss exception handlers. these registers can be accessed by supervisor-level instructions only. any attempt to access these sprs with user-level instructions results in a privileged instruction program exception. as dmiss, imiss, dcmp, icmp, hash1, hash2, and rpa are used to access the translation tables for software table search operations, they should only be accessed when address translation is disabled (that is, msr[ir] = 0 and msr[dr] = 0). note that msr[ir] and msr[dr] are cleared by the processor whenever an exception occurs. 5.5.2.1.1 data and instruction tlb miss address registers (dmiss and imiss) the dmiss and imiss registers have the same format as shown in figure 5-12. they are loaded automatically upon a data or instruction tlb miss. the dmiss and imiss contain the effective page address of the access which caused the tlb miss exception. the contents are used by the processor when calculating the values of hash1 and hash2, and by the tlbld and tlbli instructions when loading a new tlb entry. note that the 603e always loads a big-endian address into the dmiss register. these registers are read-only to the software. figure 5-12. dmiss and imiss registers table 5-10. implementation-specific srr1 bits bit number name function 0? crf0 condition register ?ld 0 bits 12 key key for tlb miss (either ks or kp from segment register, depending on whether the access is a user or supervisor access) 13 d/i set if instruction tlb miss 14 way next tlb set to be replaced (set per lru) 15 s/l set if data tlb miss was for a load instruction select key from segment register: if msr[pr] = 0, key = ks if msr[pr] = 1, key = kp 0 31 effective page address
motorola chapter 5. memory management 5-37 5.5.2.1.2 data and instruction tlb compare registers (dcmp and icmp) the dcmp and icmp registers are shown in figure 5-13. these registers contain the ?st word in the required pte. the contents are constructed automatically from the contents of the segment registers and the effective address (dmiss or imiss) when a tlb miss exception occurs. each pte read from the tables in memory during the table search process should be compared with this value to determine whether or not the pte is a match. upon execution of a tlbld or tlbli instruction, the contents of the dcmp or icmp register is loaded into the ?st word of the selected tlb entry. figure 5-13. dcmp and icmp registers table 5-11 describes the bit settings for the dcmp and icmp registers. 5.5.2.1.3 primary and secondary hash address registers (hash1 and hash2) hash1 and hash2 contain the physical addresses of the primary and secondary ptegs for the access that caused the tlb miss exception. only bits 7?5 differ between them. for convenience, the processor automatically constructs the full physical address by routing bits 0? of sdr1 into hash1 and hash2 and clearing the lower six bits. these registers are read-only and are constructed from the contents of the dmiss or imiss register. the format for the hash1 and hash2 registers is shown in figure 5-14. figure 5-14. hash1 and hash2 registers table 5-11. dcmp and icmp bit settings bits name description 0 v valid bit. set by the processor on a tlb miss exception. 1?4 vsid virtual segment id. copied from vsid ?ld of corresponding segment register. 25 h hash function identi?r. cleared by the processor on a tlb miss exception 26?1 api abbreviated page index. copied from api of effective address. 01 24 25 26 31 v h vsid api 067 25 26 31 reserved htaborg hashed page address 0 0 0 0 0 0
5-38 mpc603e & EC603E risc microprocessors user's manual motorola table 5-12 describes the bit settings of the hash1 and hash2 registers. 5.5.2.1.4 required physical address register (rpa) the rpa is shown in figure 5-15. during a page table search operation, the software must load the rpa with the second word of the correct pte. when the tlbld or tlbli instruction is executed, data from the imiss and icmp (or dmiss and dcmp) and the rpa registers is merged and loaded into the selected tlb entry. the tlb entry is selected by the effective address of the access (loaded by the table search software from the dmiss or imiss register) and the srr1[way] bit. figure 5-15. required physical address (rpa) register table 5-13 describes the bit settings of the rpa register. 5.5.2.2 software table search operation when a tlb miss occurs, the instruction or data mmu loads the imiss or dmiss register, respectively, with the effective address of the access. the processor completes all instructions dispatched prior to the exception, status information is saved in srr1, and one of the three tlb miss exceptions is taken. in addition, the processor loads the icmp or dcmp register with the value to be compared with the ?st word of ptes in the tables in memory. table 5-12. hash1 and hash2 bit settings bits name description 0? htaborg[0?] copy of the upper 7 bits of the htaborg ?ld from sdr1 7?5 hashed page address address bits 7?5 of the pteg to be searched. 26?1 reserved table 5-13. rpa bit settings bits name description 0?9 rpn physical page number from pte 20-22 reserved 23 r referenced bit from pte 24 c changed bit from pte 25?8 wimg memory/cache access attribute bits 29 reserved 30?1 pp page protection bits from pte 0 19 20 22 23 24 25 28 29 30 31 reserved rpn r c wimg pp
motorola chapter 5. memory management 5-39 the software should then access the ?st pte at the address pointed to by hash1. the ?st word of the pte should be loaded and compared to the contents of dcmp or icmp. if there is a match, then the required pte has been found and the second word of the pte is loaded from memory into the rpa register. then the tlbli or tlbld instruction is executed, which loads the contents of the icmp (or dcmp) and rpa registers into the selected tlb entry. the tlb entry is selected by the effective address of the access and the srr1[way] bit. if the compare did not result in a match, however, the pteg address is incremented to point to the next pte in the table and the above sequence is repeated. if none of the eight ptes in the primary pteg matches, the sequence is then repeated using the secondary pteg (at the address contained in hash2). if the pte is also not found in the eight entries of the secondary page table, a page fault condition exists, and a page fault exception must be synthesized. thus the appropriate bits must be set in srr1 (or dsisr) and the tlb miss handler must branch to either the isi or dsi exception handler, which handles the page fault condition. this section provides a ?w diagram outlining some example software that can be used to handle the three tlb miss exceptions, as well as some example assembly language that implements that ?w. 5.5.2.2.1 flow for example exception handlers figure 5-16 shows the ?w for the example tlb miss exception handlers. the ?w shown is common for the three exception handlers, except that the imiss and icmp registers are used for the instruction tlb miss exception while the dmiss and dcmp registers are used for the two data tlb miss exceptions. also, for the cases of store instructions that cause either a tlb miss or require a table search operation to update the c bit, the ?w shows that the c bit is set in both the tlb entry and the pte in memory. finally, in the case of a page fault (no pte found in the table search operation), the setup for the isi or dsi exception is slightly different for these two cases. figure 5-17 shows the ?w for checking the r and c bits and setting them appropriately, figure 5-18 shows the ?w for synthesizing a page fault exception when no pte is found. figure 5-19 shows the ?w for managing the cases of a tlb miss on an instruction access to guarded memory, and a tlb miss when c = 0 and a protection violation exists. the set up for these protection violation exceptions is very similar to that of page fault conditions (as shown in figure 5-18) except that different bits in srr1 (and dsisr) are set.
5-40 mpc603e & EC603E risc microprocessors user's manual motorola figure 5-16. flow for example software table search operation (see figure 5-17) set counter: cnt ? 8 load primary pteg pointer: ptr ? hash1 ?8 compare_value ? icmp/dcmp read lower word of next pte from memory: ptr ? ptr + 8 temp ? (ptr) read upper word of pte: temp ? (ptr - 4) otherwise rpa ? temp ? imiss/dmiss load tlb entry tlbli (or tlbld ) otherwise cnt 1 0 save old counter and cr0 bits restore old counter and cr0 bits otherwise load secondary pteg pointer: ptr ? hash2 ?8 compare_value [h] ? 1 set counter: cnt ? 8 cnt ? cnt? set up for page fault exception secondary hash complete return to executing program: r compare_value [h] = 1 (see figure 5-18) tlb miss exception instruction access and temp[g] = 1 otherwise set up for protection violation exception check r, c bits and set as needed (see figure 5-19) temp = compare_value
motorola chapter 5. memory management 5-41 figure 5-17. check and set r and c bit flow store byte 7 of pte to memory: (ptr - 2) ? temp [byte7] set r bit: temp ? temp or 0x100 handler for data store op check r, c bits and set as needed otherwise pp = 00 01 set up for protection violation check pro- tection pp = 10 11 set up for protection violation pp = 11 pp = 10 return to tlb miss exception ?w (see figure 5-16) (see figure 5-19) (see figure 5-19) temp[c] = 0 otherwise srr1[key] = 1 store bytes 6, 7 of pte to memory: (ptr - 2) ? temp [bytes 6, 7] return to tlb miss exception ?w (see figure 5-16) set r, c bits: temp ? temp or 0x180 otherwise
5-42 mpc603e & EC603E risc microprocessors user's manual motorola figure 5-18. page fault setup flow set up for page fault exception data tlb miss handlers instruction tlb miss handlers dsisr[6] ? srr1[15] dsisr[1] ? 1 dar ? dtemp restore cr0 bits msr[tgpr] ? 0 branch to dsi exception handler restore cr0 bits msr[tgpr] ? 0 branch to isi exception handler clear upper bits of srr1 srr1 ? srr1 and 0xffff srr1[1] ? 1 clear upper bits of srr1 srr1 ? srr1 and 0xffff srr1[31] = 1 (little-endian mode) dtemp ? dmiss dtemp ? dtemp xor 0x07 otherwise
motorola chapter 5. memory management 5-43 figure 5-19. setup for protection violation exceptions dsisr[6] ? srr1[15] dsisr[4] ? 1 restore cr0 bits msr[tgpr] ? 0 branch to dsi exception handler restore cr0 bits msr[tgpr] ? 0 branch to isi exception handler clear upper bits of srr1 srr1 ? srr1 and 0xffff srr1[4] ? 1 clear upper bits of srr1 srr1 ? srr1 and 0xffff data tlb miss handlers (instruction access to guarded memory) (data access to protected memory; c=0) set up for protection violation exceptions dar ? dtemp srr1[31] = 1 (little-endian mode) dtemp ? dmiss dtemp ? dtemp xor 0x07 otherwise instruction tlb miss handler
5-44 mpc603e & EC603E risc microprocessors user's manual motorola 5.5.2.2.2 code for example exception handlers this section provides some assembly language examples that implement the ?w diagrams described above. note that although these routines ? into a few cache lines, they are supplied only as a functional example; they could be further optimized for faster performance. # tlb software load for 603e # # new instructions: # tlbld - write the dtlb with the pte in rpa reg # tlbli - write the itlb with the pte in rpa reg # new sprs # dmiss - address of dstream miss # imiss - address of istream miss # hash1 - address primary hash pteg address # hash2 - returns secondary hash pteg address # icmp - returns the primary istream compare value # dcmp - returns the primary dstream compare value # rpa - the second word of pte used by tlblx # # gpr r0..r3 are shadowed # # there are three flows. # tlbdatamiss- tlb miss on data load # tlbceq0 - tlb miss on data store or store with tlb change bit == 0 # tlbinstrmiss- tlb miss on instruction fetch #+ # place labels for rel branches #- #.machine ppc_603e .set r0, 0 .set r1, 1 .set r2, 2 .set r3, 3 .set dmiss, 1010 .set dcmp, 1011 .set hash1, 1012 .set hash2, 1013 .set imiss, 1014 .set icmp, 1015 .set rpa, 1010 .set c0, 0 .set dar, 19 .set dsisr, 18 .set srr0, 26 .set srr1, 27 . .csect tlbmiss[pr] vec0: .globl vec0
motorola chapter 5. memory management 5-45 .org vec0+0x300 vec300: .org vec0+0x400 vec400: #+ # instruction tb miss flow # entry: # vec = 1000 # srr0 -> address of instruction that missed # srr1 -> 0:3=cr0 4=lru way bit 16:31 = saved msr # msr -> 1 # imiss -> ea that missed # icmp -> the compare value for the va that missed # hash1 -> pointer to first hash pteg # hash2 -> pointer to second hash pteg # # register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value .org vec0+0x1000 tlbinstrmiss: mfspr r2, hash1 # get first pointer addi r1, 0, 8 # load 8 for counter mfctr r0 # save counter mfspr r3, icmp # get first compare value addi r2, r2, -8 # pre dec the pointer im0: mtctr r1 # load counter im1: lwzu r1, 8(r2) # get next pte cmp c0, r1, r3 # see if found pte bdneq im1 # dec count br if cmp ne and if count not zero bne instrsechash# if not found set up second hash or exit l r1, +4(r2) # load tlb entry lower-word andi. r3, r1, 8 # check g-bit bne doisip # if guarded, take an isi mtctr r0 # restore counter mfspr r0, imiss # get the miss address for the tlbli mfspr r3, srr1 # get the saved cr0 bits mtcrf 0x80, r3 # restore cr0 mtspr rpa, r1 # set the pte ori r1, r1, 0x100# set reference bit srw r1, r1, 8 # get byte 7 of pte tlbli r0 # load the itlb stb r1, +6(r2) # update page table rfi # return to executing program #+ # register usage:
5-46 mpc603e & EC603E risc microprocessors user's manual motorola # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #- instrsechash: andi. r1, r3, 0x0040# see if we have done second hash bne doisi # if so, go to isi exception mfspr r2, hash2 # get the second pointer ori r3, r3, 0x0040# change the compare value addi r1, 0, 8 # load 8 for counter addi r2, r2, -8 # pre dec for update on load b im0 # try second hash #+ # entry not found: synthesize an isi exception # guarded memory protection violation: synthesize an isi exception # entry: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value # doisip: mfspr r3, srr1 # get srr1 andi. r2,r3,0xffff # clean upper srr1 addis r2, r2, 0x0800# or in srr<4> = 1 to flag prot violation b isi1: doisi: mfspr r3, srr1 # get srr1 andi. r2, r3, 0xffff# clean srr1 addis r2, r2, 0x4000# or in srr1<1> = 1 to flag pte not found mtctr r0 # restore counter isi1 mtspr srr1, r2 # set srr1 mfmsr r0 # get msr xori r0, r0, 0x8000# flip the msr bit mtcrf 0x80, r3 # restore cr0 mtmsr r0 # flip back to the native gprs b vec400 # go to instr. access exception # #+ # data tlb miss flow # entry: # vec = 1100 # srr0 -> address of instruction that caused data tlb miss # srr1 -> 0:3=cr0 4=lru way bit 5=1 if store 16:31 = saved msr # msr -> 1 # dmiss -> ea that missed # dcmp -> the compare value for the va that missed # hash1 -> pointer to first hash pteg
motorola chapter 5. memory management 5-47 # hash2 -> pointer to second hash pteg # # register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #- .csect tlbmiss[pr] .org vec0+0x1100 tlbdatamiss: mfspr r2, hash1 # get first pointer addi r1, 0, 8 # load 8 for counter mfctr r0 # save counter mfspr r3, dcmp # get first compare value addi r2, r2, -8 # pre dec the pointer dm0: mtctr r1 # load counter dm1: lwzu r1, 8(r2) # get next pte cmp c0, r1, r3 # see if found pte bdneq dm1 # dec count br if cmp ne and if count not zero bne datasechash# if not found set up second hash or exit l r1, +4(r2) # load tlb entry lower-word mtctr r0 # restore counter mfspr r0, dmiss # get the miss address for the tlbld mfspr r3, srr1 # get the saved cr0 bits mtcrf 0x80, r3 # restore cr0 mtspr rpa, r1 # set the pte ori r1, r1, 0x100# set reference bit srw r1, r1, 8 # get byte 7 of pte tlbld r0 # load the dtlb stb r1, +6(r2) # update page table rfi # return to executing program #+ # register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #- datasechash: andi. r1, r3, 0x0040# see if we have done second hash bne dodsi # if so, go to dsi exception mfspr r2, hash2 # get the second pointer ori r3, r3, 0x0040# change the compare value addi r1, 0, 8 # load 8 for counter addi r2, r2, -8 # pre dec for update on load b dm0 # try second hash # #+
5-48 mpc603e & EC603E risc microprocessors user's manual motorola # c=0 in dtlb and dtlb miss on store flow # entry: # vec = 1200 # srr0 -> address of store that caused the exception # srr1 -> 0:3=cr0 4=lru way bit 5=1 16:31 = saved msr # msr -> 1 # dmiss -> ea that missed # dcmp -> the compare value for the va that missed # hash1 -> pointer to first hash pteg # hash2 -> pointer to second hash pteg # # register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #- .csect tlbmiss[pr] .org vec0+0x1200 tlbceq0: mfspr r2, hash1 # get first pointer addi r1, 0, 8 # load 8 for counter mfctr r0 # save counter mfspr r3, dcmp # get first compare value addi r2, r2, -8 # pre dec the pointer ceq0: mtctr r1 # load counter ceq1: lwzu r1, 8(r2) # get next pte cmp c0, r1, r3 # see if found pte bdneq ceq1 # dec count br if cmp ne and if count not zero bne ceq0sechash# if not found set up second hash or exit l r1, +4(r2) # load tlb entry lower-word andi. r3,r1,0x80 # check the c-bit beq ceq0chkprot# if (c==0) go check protection modes ceq2: mtctr r0 # restore counter mfspr r0, dmiss # get the miss address for the tlbld mfspr r3, srr1 # get the saved cr0 bits mtcrf 0x80, r3 # restore cr0 mtspr rpa, r1 # set the pte tlbld r0 # load the dtlb rfi # return to executing program #+ # register usage: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value #- ceq0sechash: andi. r1, r3, 0x0040# see if we have done second hash
motorola chapter 5. memory management 5-49 bne dodsi # if so, go to dsi exception mfspr r2, hash2 # get the second pointer ori r3, r3, 0x0040# change the compare value addi r1, 0, 8 # load 8 for counter addi r2, r2, -8 # pre dec for update on load b ceq0 # try second hash #+ # entry found and pte(c-bit==0): # (check protection before setting pte(c-bit) # register usage: # r0 is saved counter # r1 is pte entry # r2 is pointer to pteg # r3 is trashed #- ceq0chkprot: rlwinm. r3,r1,30,0,1 # test pp bge- chk0 # if (pp==00 or pp==01) goto chk0: andi. r3,r1,1 # test pp[0] beq+ chk2 # return if pp[0]==0 b dodsip # else dsip chk0: mfspr r3,srr1 # get old msr andis. r3,r3,0x0008# test the key bit (srr0-bit 12) beq chk2 # if (key==0) goto chk2: b dodsip # else dsip chk2: ori r1, r1, 0x180# set reference and change bit sth r1, -2(r2) # update page table b ceq2 # and back we go # #+ # entry not found: synthesize a dsi exception # entry: # r0 is saved counter # r1 is junk # r2 is pointer to pteg # r3 is current compare value # dodsi: mfspr r3, srr1 # get srr1 rlwinm r1, r3, 9,6,6 # get srr1 to bit 6 for load/store, zero rest addis r1, r1, 0x4000# or in dsisr<1> = 1 to flag pte not found b dsi1: dodsip: mfspr r3, srr1 # get srr1 rlwinm r1, r3, 9,6,6 # get srr1 to bit 6 for load/store, zero rest addis r1, r1, 0x0800# or in dsisr<4> = 1 to flag prot violation dsi1: mtctr r0 # restore counter andi. r2, r3, 0xffff# clear upper bits of srr1 mtspr srr1, r2 # set srr1 mtspr dsisr, r1 # load the dsisr
5-50 mpc603e & EC603E risc microprocessors user's manual motorola mfspr r1, dmiss # get miss address rlwinm. r2,r2,0,31,31# test le bit bne dsi2: # if little endian then: xor r1,r1,0x07 # de-mung the data address dsi2: mtspr dar, r1 # put in dar mfmsr r0 # get msr xoris r0, r0, 0x2 # flip the msr bit mtcrf 0x80, r3 # restore cr0 mtmsr r0 # flip back to the native gprs b vec300 # branch to dsi exception 5.5.3 page table updates when tlbs are implemented (as in the 603e) they are de?ed as noncoherent caches of the page tables. tlb entries must be ?shed explicitly with the tlb invalidate entry instruction ( tlbie ) whenever the corresponding pte is modi?d. since the 603e is intended primarily for uniprocessor environments, it does not provide coherency of tlbs between multiple processors. if the 603e is used in a multiprocessor environment where tlb coherency is required, all synchronization must be implemented in software. processors may write referenced and changed bits with unsynchronized, atomic byte store operations. note that the v, r, and c bits each resides in a distinct byte of a pte. therefore, extreme care must be taken to use byte writes when updating only one of these bits. explicitly altering certain msr bits (using the mtmsr instruction), or explicitly altering ptes, or certain system registers, may have the side effect of changing the effective or physical addresses from which the current instruction stream is being fetched. this kind of side effect is de?ed as an implicit branch. implicit branches are not supported and an attempt to perform one causes boundedly-unde?ed results. therefore, ptes must not be changed in a manner that causes an implicit branch. chapter 2, ?owerpc register set,?in the programming environments manual , lists the possible implicit branch conditions that can occur when system registers and msr bits are changed. 5.5.4 segment register updates there are certain synchronization requirements for using the move to segment register instructions. these are described in ?ynchronization requirements for special registers and for lookaside buffers?in chapter 2, ?owerpc register set,?in the programming environments manual .
motorola chapter 6. instruction timing 6-1 chapter 6 instruction timing 60 60 this chapter describes instruction prefetch and execution through all of the execution units of the powerpc 603e microprocessor. it also provides examples of instruction sequences showing concurrent execution and various register dependencies to illustrate timing interactions. bus signals described in this chapter are only accurate to within half clock cycle increments. see chapter 8, ?ystem interface operation,?for more speci? information regarding bus operation timing. instruction mnemonics used in this chapter can be identi?d by referring to chapter 8, ?nstruction set,?in the programming environments manual. 6.1 terminology and conventions this section describes terminology and conventions used in this chapter. branch prediction?he process of guessing whether a branch will be taken. such predictions can be correct or incorrect; the term predicted as it is used here does not imply that the prediction is correct (successful). the powerpc architecture de?es a means for static branch prediction, which is part of the instruction encoding. branch resolution?he determination of whether a branch is taken or not taken. a branch is said to be resolved when it can exactly be determined which path it will take. if the branch is resolved as predicted, the instructions following the predicted branch can be completed. if the branch is not resolved as predicted, instructions on the mispredicted path are purged from the instruction pipeline and are replaced with the instructions from the nonpredicted path. completion?ompletion occurs when an instruction is removed from the completion buffer. when an instruction completes we can be sure that this instruction and all previous instructions will cause no exceptions. in some situations, an instruction can ?ish and complete in the same cycle. finish?he term indicates the ?al cycle of execution. in this cycle, the completion buffer is updated to indicate that the instruction has ?ished executing. latency?he number of clock cycles necessary to execute an instruction and make ready the results of that execution for a subsequent instruction.
6-2 mpc603e & EC603E risc microprocessors user's manual motorola pipeline?n the context of instruction timing, the term pipeline refers to the interconnection of the stages. the events necessary to process an instruction are broken into several cycle-length tasks to allow work to be performed on several instructions simultaneously?nalogous to an assembly line. as an instruction is processed, it passes from one stage to the next. when it does, the stage becomes available for the next instruction. although an individual instruction may take many cycles to complete (the number of cycles is called instruction latency), pipelining makes it possible to overlap the processing so that the throughput (number of instructions completed per cycle) is greater than if pipelining were not implemented. program order?he original order in which program instructions are provided to the instruction queue from the cache. rename buffer?emporary buffers used by instructions that have not completed and as write-back buffers for those that have. reservation station? buffer between the dispatch and execute stages that allows instructions to be dispatched even though the operands required for execution may not yet be available. stage?n element in the pipeline at which certain actions are performed, such as decoding the instruction, performing an arithmetic operation, and writing back the results. a stage typically takes a cycle to perform its operation; however, some stages are repeated (a double-precision ?ating-point multiply, for example). when this occurs, an instruction immediately following it in the pipeline is forced to stall in its cycle. in some cases, an instruction may also occupy more than one stage simultaneously?or example, instructions may complete and write back their results in the same cycle. after an instruction is fetched, it can always be de?ed as being in one or more stages. stall?n occurrence when an instruction cannot proceed to the next stage. superscalar? superscalar processor is one that can issue multiple instructions concurrently from a conventional linear instruction stream. in a superscalar implementation, multiple instructions can be in the same stage at the same time. throughput? measure of the number of instructions that are processed per cycle. for example, a series of double-precision ?ating-point multiply instructions has a throughput of one instruction per clock cycle. write-back?rite-back (in the context of instruction handling) occurs when a result is written from the rename registers into the architectural registers (typically the gprs and fprs). results are written back at completion time or are moved into the write-back buffer. results in the write-back buffer cannot be ?shed. if an exception occurs, these buffers must write back before the exception is taken.
motorola chapter 6. instruction timing 6-3 6.2 instruction timing overview the 603e has been designed to minimize average instruction execution latency. latency is de?ed as the number of clock cycles necessary to execute an instruction and make ready the results of that execution for a subsequent instruction. for many of the instructions in the 603e, this can be simpli?d to include only the execute phase for a particular instruction. however, data access instructions require additional clock cycles between the execute phase and the write-back phase due to memory latencies. in accordance with this de?ition, logical, bit-?ld, and most integer instructions have a latency of one clock cycle (for example, results for these instructions are ready for use on the next clock cycle after issue). other instructions, such as the integer multiply, require more than one clock cycle to complete execution. effective throughput of more than one instruction per clock cycle can be realized by the many performance features in the 603e including pipelining, superscalar instruction issue, branch acceleration, and multiple execution units that operate independently and in parallel. the load/store and ?ating-point units on the 603e are pipelined, which means that the execution units are broken into stages. each stage performs a speci? step, which contributes to the overall execution of an instruction. the pipelined design is analogous to an assembly line where workers perform a speci? task and pass the partially complete product to the next worker. (note: the EC603E microprocessor does not support the ?ating-point unit.) when an instruction is issued to a pipelined execution unit, the ?st stage in the pipeline begins its designated work on that instruction. as an instruction is passed from one stage in the pipeline to the next, evacuated stages may accept new instructions. this design allows a single execution unit to be working on several different instructions simultaneously. while it may take several cycles for a given instruction to propagate through the execution pipeline, once the pipeline has been ?led with instructions the execution unit is capable of completing an instruction every clock. figure 6-1 shows a graphical representation of a generic pipelined execution unit.
6-4 mpc603e & EC603E risc microprocessors user's manual motorola figure 6-1. pipelined execution unit if the number of stages in each pipeline is equal to the total latency in clock cycles of its respective execution unit, the processor can continuously issue instructions to the same execution unit without stalling. thus, when enough instructions have been issued to an execution unit to ?l its pipeline, the ?st instruction will have completed execution and exited the pipeline, allowing subsequent instructions to be issued into the tail of the pipeline without interruption. the 603es completion buffer is capable of retiring two instructions on every clock cycle. in general, instruction processing is accomplished in four stages described as follows: the fetch pipeline stage primarily involves retrieving instructions from the memory system and determining the location of the next instruction fetch. additionally, the bpu decodes branches during the fetch stage and folds out branch instructions before the dispatch stage if possible. the instruction fetch stage includes the clock cycles necessary to request instructions from the on-chip cache as well as the time it takes the on-chip cache to respond to that request. the decode/dispatch pipeline stage is responsible for decoding the instructions supplied by the instruction fetch stage, and determining which of the instructions are eligible to be dispatched in the current cycle. in addition, the source operands of the instructions are read from the appropriate register ?e and dispatched with the instruction to the execute pipeline stage. at the end of the dispatch pipeline stage, the dispatched instructions and their operands are latched by the appropriate execution unit. during the execute pipeline stage each execution unit that has an executable instruction executes the selected instruction (perhaps over multiple cycles), writes the instruction's result into the appropriate rename register, and noti?s the completion stage that the instruction has ?ished execution. in the case of an internal exception, the execution unit reports the exception to the completion/writeback pipeline stage and discontinues instruction execution until the exception is handled. the exception is not signaled until that instruction is the next to be completed. execution of most ?ating-point instructions is pipelined within the fpu allowing up to three instructions to be executing in the fpu concurrently. the pipeline stages clock 0 (stage 1) a (stage 2) (stage 3) clock 1 (stage 1) b (stage 2) a (stage 3) clock 2 (stage 1) c (stage 2) b (stage 3) a clock 3 (stage 1) d (stage 2) c (stage 3) b
motorola chapter 6. instruction timing 6-5 for the ?ating-point unit are multiply, add, and round-convert. execution of most load/store instructions is also pipelined. the load/store unit has two pipeline stages. the ?st stage is for effective address calculation and mmu translation and the second stage is for accessing the data in the cache. the complete/writeback pipeline stage maintains the correct architectural machine state and transfers the contents of the rename registers to the gprs and fprs as instructions are retired. if the completion logic detects an instruction causing an exception, all following instructions are canceled, their execution results in rename registers are discarded, and instructions are fetched from the correct instruction stream. more information regarding these operations are provided in the following paragraphs. 6.3 timing considerations a superscalar processor is one that issues multiple independent instructions into multiple pipelines allowing instructions to execute in parallel. the 603e has ?e independent execution units (four execution units on the EC603E microprocessor), one each for integer instructions, ?ating-point instructions (not supported on the EC603E microprocessor), branch instructions, load/store instructions, and system register instructions. the iu and the fpu each have dedicated register ?es for maintaining operands (gprs and fprs, respectively), allowing integer calculations and ?ating-point calculations to occur simultaneously without interference. integer division performance of the pid7v-603e has been improved, with the divwu x and divw x instructions executing in 20 clock cycles, instead of the 37 cycles required in the pid6-603e. note: the fpu is not supported on the EC603E microprocessor; therefore, ?ating-point instructions are trapped by the ?ating-point unavailable exception and can be emulated in software. the 603e is a true superscalar implementation of the powerpc architecture since a maximum of three instructions can be issued to the execution units (one branch instruction to the branch processing unit, and two instructions issued from the dispatch queue to the other execution units) during each clock cycle. although a superscalar implementation complicates instruction timing, these complications are transparent to the software. while the 603e appears to the programmer to execute instructions in sequential order, the 603e provides increased performance by executing multiple instructions at a time, and using hardware to manage dependencies. the 603e provides support for single-cycle store and it provides an adder/comparator in the system register unit that allows the dispatch and execution of multiple integer add and compare instructions on each cycle. when an instruction is issued, the register ?e places the appropriate source data on the appropriate source bus. the corresponding execution unit then reads the data from the bus.
6-6 mpc603e & EC603E risc microprocessors user's manual motorola the register ?es and source buses have suf?ient bandwidth to allow the dispatching of two instructions per clock. the 603e contains the following execution units that operate independently and in parallel: branch processing unit (bpu) 32-bit integer unit (iu) 64-bit ?ating-point unit (fpu) (not supported on the EC603E microprocessor) load/store unit (lsu) system register unit (sru) the 603es branch processing unit decodes and executes branches immediately after they are fetched. the resources of the branch unit include? count register (ctr) rename register for mtspr (ctr), a link register (lr) rename register for mtspr (lr), a link register (lr) rename register for branches specifying an update of the link register, and a branch reservation station for conditional branches that cannot be resolved due to a cr-data dependency. when a conditional branch cannot be resolved due to a cr-data dependency, the branch direction is predicted and execution commences down the predicted path. if the branch resolves as incorrectly guessed, then: 1. the instruction buffer is purged and fetching of the correct path commences, 2. any instructions executed prior to the predicted branch in the completion buffer are allowed to ?omplete? 3. all instructions executed subsequent to the mispredicted branch are purged from the machine, and 4) dispatching down the correct path commences. when the iu, sru, or fpu (not supported on the EC603E microprocessor) finishes executing an instruction, it places the resulting data, if any, into one of the general-purpose register (gpr) or ?ating-point register (fpr) rename registers. the results are then stored into the correct gpr during the write-back stage. if a subsequent instruction is waiting for this data, it is forwarded past the register ?e, directly into the appropriate execution unit for the immediate execution of the waiting instruction. this allows a data-dependent instruction to be decoded without waiting for the data to be written into the register ?e and then read back out again. this feature, known as feed forwarding, signi?antly shortens the time the machine may stall on data dependencies. 6.3.1 general instruction flow instructions are fetched from the instruction cache at a peak rate of two per cycle, and placed in either the instruction queue (iq) or the bpu. instructions enter the iq and are issued to the various execution units from the dispatch queue. the iq is a six-entry queue, which is the backbone of the master pipeline for the microprocessor. the 603e tries to keep the iq full at all times. although two instructions can be brought in from the on-chip cache in a single clock cycle, if there is a one-instruction vacancy in the iq, one instruction will
motorola chapter 6. instruction timing 6-7 be fetched from the cache to ?l it. if while topping off the iq, the request for new instructions misses in the on-chip cache, then arbitration for a memory access will begin. instructions enter the iq through entry 5 and ?ter down to be issued from queue entry 1 or 0. the fetch bus between the iq and the on-chip cache is wide enough for two instructions to be brought into the iq simultaneously, which matches the dispatchers ability to issue two instructions per cycle. branch instructions are identi?d by the fetcher, and forwarded to the bpu directly, bypassing the dispatch queue. the branch is either executed and resolved (if the branch is unconditional or if required conditions are available), or is predicted. once a branch instruction has been executed, it may need to update a special-purpose register. in that case, the branch instruction will do its write back sometime after the decode/execute phase. if no write back is needed, the branch instruction is retired. all other instructions are issued from the dispatch queue, with dispatch rate contingent on execution unit busy status, rename and completion buffer availability, and the serializing behavior of some instructions. instruction dispatch is done in program order, and if the instruction in queue entry 0 is unable to be dispatched, it will inhibit the instruction in queue entry 1 from being issued.
6-8 mpc603e & EC603E risc microprocessors user's manual motorola figure 6-2 re?cts the organization of the 603e, and the paths taken by instructions issued from the instruction queue and how those instructions progress through the various execution units. figure 6-2. instruction flow diagram fpu* complete (retire) fetch lsu sru dispatch branch processing unit instruction queue (in program order) completion queue (in program order) completion buffer assignment finish iu store queue 5 0 0 4 note: the EC603E microprocessor does not support the ?ating-point unit.
motorola chapter 6. instruction timing 6-9 6.3.2 instruction fetch timing the timing of the instruction fetch mechanism on the 603e depends heavily on the state of the on-chip cache. the speed with which the required instruction is returned to the fetcher depends on whether the instruction being asked for is in the on-chip cache (cache hit) or whether a memory transaction is required to bring the data into the cache (cache miss). these issues are discussed further in the following sections. 6.3.2.1 cache arbitration when the instruction fetcher attempts to fetch instructions from the on-chip cache, the cache may or may not be able to immediately respond to the request. there are two scenarios that may be encountered by the instruction fetcher when it requests instructions from the on-chip cache. the ?st scenario is when the on-chip cache is idle and a request comes in from the instruction fetcher for additional instructions. in this case, the on-chip cache responds with the requested instructions on the next clock cycle. the second scenario occurs if at the time the instruction fetcher requests instructions, the on-chip cache is busy due to a cache-line-reload operation. when this case arises, the on- chip cache will be inaccessible until the reload operation is complete. 6.3.2.2 cache hit assuming that the instruction fetcher is not blocked from the cache by a cache-reload operation and the instructions it needs are in the on-chip cache (a cache hit has occurred), there will be only one clock cycle between the time that the instruction fetcher requests the instructions and the time that the instructions enter the iq. as previously stated, two instructions can be simultaneously fetched from the on-chip cache and loaded into the iq. figure 6-3 shows a brief example of instruction fetching that hits in the on-chip cache. in this example, two instructions are fetched into the iq during clock cycle 0. during clock cycle 1, instructions 0 and 1 are dispatched to the integer and ?ating-point execution units. during clock cycle 2, a branch instruction is fetched into the branch processing unit. the bpu is immediately able to determine that the branch will indeed change program ?w and sends a request to the on-chip cache for the new instruction stream. during clock cycle 4, the new instructions arrive in the iq. in clock cycle 5, one integer instruction is dispatched to the integer unit, and the following instruction (also an integer instruction) is blocked from dispatch until clock cycle 6. instructions fetched in clock cycle 5 are held in the iq until the dispatch queue is cleared on the next cycle. as the iq is emptied into the individual execution units, additional instructions will be requested from the on-chip cache.
6-10 mpc603e & EC603E risc microprocessors user's manual motorola figure 6-3. instruction timing?ache hit 6.3.2.3 cache miss figure 6-4 shows a brief example of an instruction fetch that misses in the on-chip cache and how that fetch affects the instruction issue. note that the processor/bus clock ratio is 1:1 in this example. in this example, two instructions are fetched into the iq during clock cycle 0. during clock cycle 1, instructions 0 and 1 are dispatched to the integer and ?ating-point execution units. during clock cycle 2, a branch instruction is fetched into the branch processing unit. the bpu is immediately able to determine that the branch will indeed change program ?w and sends a request to the on-chip cache for the new instruction stream. 12 345678 0 fetch dispatch execute write back 0 add 1 fadd 2 add 3 fadd 8 add 4 br 9 add 10 add 11 fsub 9 held in iq deallocate 5 fadd 6 fadd 7 fadd
motorola chapter 6. instruction timing 6-11 figure 6-4. instruction timing?ache miss during clock cycle 3, the on-chip cache misses the access and determines that a memory access will have to occur. during clock cycle 5, the address of the block of instructions is applied to the system bus. during clock cycle 7, two instructions (64 bits) are returned from memory, and are forwarded to the cache and the instruction fetcher. in subsequent clock cycles, one integer and one ?ating-point instruction is dispatched to their respective execution units. instructions are forwarded to the instruction fetcher and the cache until the cache line reload is completed in cycle 10. 6.3.3 instruction dispatch and completion considerations several factors affect the 603es ability to dispatch instructions at a peak rate of two per cycle. these factors include execution unit availability, destination rename register availability, completion buffer availability, and the handling of dispatch-serialized instructions. to avoid dispatch unit stalls due to instruction data dependencies, the 603e provides a reservation station for each execution unit. if a data dependency exists that may preclude an instruction from beginning execution, that instruction will be dispatched to the reservation station associated with its execution unit, thereby clearing the dispatch unit. when the data that the operation depends upon is returned via a cache access or as a result of a previous operation, execution will begin during the same clock cycle that the register 12 345678 0 91011 data address 0 add 1 fadd 2 add 3 fadd 5 add 4 br fetch dispatch execute write back 6 fsub deallocate 8 fsub 7 add 10 fsub 9 add
6-12 mpc603e & EC603E risc microprocessors user's manual motorola ?e is being updated. if the second instruction in the dispatch unit requires the same execution unit, dispatch of that instruction will stall until the ?st instruction completes execution. the completion unit provides a mechanism to track instructions from dispatch through execution, and then retire or ?omplete?them in program order. completing an instruction implies the commitment of the results of instruction execution to the architected registers. in-order completion ensures the correct architectural state when the 603e must recover from a mispredicted branch, or any other exception or interrupt. (note that the term exception is referred to as interrupt in the architecture specification.) instruction state and all information required for completion is kept in a ?st-in, ?st-out queue of ?e completion buffers. a single completion buffer is allocated for each instruction once it is dispatched by the dispatch unit. a completion buffer is a required resource for dispatch; if there are no completion buffers available, the dispatch unit will stall. while a maximum of two instructions per cycle may be completed and retired in program order from the completion unit, instruction completion can be stalled by the instruction reaching the last position in the completion queue while the instruction is still being executed. store instructions, and instructions executed by the fpu (not supported by the EC603E microprocessor) and sru (with the exception of integer add and compare instructions) can only be retired from the last position in the completion queue. the rate of instruction completion is also affected by the 603es ability to write the instruction results from the rename registers to the architected registers when the instruction is retired. the 603e can perform two write-back operations from the rename registers to the gprs each clock cycle, but can perform only one write back per cycle to the cr, fpr (not supported on the EC603E microprocessor), lr, and ctr. due to the 603es out-of-order execution capability, the in-order completion of instructions by the completion unit provides a precise exception mechanism. all program-related exceptions are signaled when the instruction causing the exception has reached the last position in the completion buffer. all prior instructions are allowed to complete before the exception is taken. 6.3.3.1 rename register operation to avoid contention for a given register ?e location in the course of out-of-order execution, the 603e provides rename registers for the storage of instruction results prior to their commitment to the architected register by the completion unit. five rename registers are provided for the gprs, four for the fprs (not supported on the EC603E microprocessor), and one each for the condition register, the link register and the count register. when the dispatch unit dispatches an instruction to its execution unit, it allocates a rename register for the results of that instruction. if an instruction is dispatched to a reservation station associated with an execution unit due to a data dependency, the dispatcher will also provide a tag to the execution unit identifying which rename register will forward the
motorola chapter 6. instruction timing 6-13 required data upon instruction completion. when the data is available in the rename register, the pending execution may begin. instruction results are transferred from the rename registers to the architected registers by the completion unit when an instruction is retired from the completion queue without exceptions and after any predicted branch conditions preceding it in the completion queue have been resolved correctly. if a predicted branch is found to have been incorrectly predicted, the instructions following the branch will be ?shed from the completion queue, and the results of those instructions will be ?shed from the rename registers. 6.3.3.2 instruction serialization while the 603e is capable of dispatching and completing two instructions per cycle, there is a class of instructions referred to as serializing instructions that limit dispatch and completion to one instruction per cycle. the type of serialization caused by these instructions fall into three categories?ompletion, dispatch, and refetch serialization. completion serialized instructions are held in the execution unit until all prior instructions in the completion unit have been retired. completion serialization is used for instructions that access or modify nonrenamed resources. results from these instructions will not be available or forwarded for subsequent instructions until the serializing instruction is retired from the completion unit. instructions that are completion serialized are as follows: instructions (with the exception of integer add and compare instructions) executed by the system register unit floating-point instructions that access or modify the fpscr (not supported on the EC603E microprocessor) or cr ( mtfsb1 , mcrfs , mtfs , mffs , and mtfsf ) instructions that manage caches and tlbs instructions that directly access the gprs (load and store multiple word and load and store string instructions) instructions de?ed by the architecture to have synchronizing behavior a subset of the completion serialized instructions are dispatch serialized. dispatch serialized instructions inhibit the dispatching of subsequent instructions until the serializing instruction is retired from the completion unit. dispatch serialization is used for instructions that access renamed resources used by the dispatcher, and for instructions requiring refetch serialization, including: the load multiple instructions, lmw , lswi , and lswx the mtspr (xer) and mcrxr instructions the synchronizing instructions, sync , isync , mtmsr , r , and sc a subset of the dispatch serialized instructions are also refetch serialized. refetch serialized instructions inhibit dispatching of subsequent instructions and force the refetching of subsequent instructions after the serializing instructions are retired from the completion unit. the context synchronizing instruction, isync , is a refetch serializing instruction.
6-14 mpc603e & EC603E risc microprocessors user's manual motorola 6.3.3.3 execution unit considerations as previously noted, the 603e is capable of dispatching and retiring two instructions per clock cycle. one of the factors affecting the peak dispatch rate is the availability of execution units on each clock cycle. for an instruction to be issued, the required execution unit must be available. the dispatcher monitors the availability of all execution units and suspends instruction dispatch if the required execution unit is not available. an execution unit may not be available if it can accept and execute only one instruction per cycle, or if an execution units pipeline becomes full. this situation may occur if instruction execution takes more clock cycles than the number of pipeline stages in the unit, and additional instructions are issued to that unit to ?l the remaining pipeline stages. 6.4 execution unit timings the following sections describe instruction timing considerations within each of the respective execution units in the 603e. refer to table 6-1 for branch instruction execution timing. 6.4.1 branch processing unit execution timing flow control operations (conditional branches, unconditional branches, and traps) are typically expensive to execute in most machines because they disrupt normal ?w in the instruction stream. when a change in program ?w occurs, the iq must be reloaded with the target instruction stream. during this time the execution units will be idle. however, previously issued instructions will continue to execute while the new instruction stream makes its way into the iq. performance features such as branch folding and static branch prediction help minimize the penalties associated with ?w control operations on the 603e. the timing for branch instruction execution is determined by many factors including the following: whether the branch is taken whether the target instruction stream is in the on-chip cache whether the branch is predicted whether the prediction is correct 6.4.1.1 branch folding when a branch instruction is encountered by the fetcher, the bpu immediately tries to pull that instruction out of the instruction stream and resolve it. when the bpu pulls the branch instruction out of the instruction stream, the instruction above the branch is shifted down to take the place of the removed branch. the technique of removing the branch instruction from the instruction sequence seen by the other execution units, is known as branch folding.
motorola chapter 6. instruction timing 6-15 often, branch folding reduces the penalties of ?w control instructions to zero since instruction execution proceeds as though the branch was never there. if the folded branch instruction changes program ?w (the branch is said to be ?aken?in this case), the bpu immediately requests the instructions at the new target from the on-chip cache. in most cases, the new instructions arrive in the iq before any bubbles are introduced into the execution units. if the folded branch does not change program ?w (the branch is said to be ?ot taken?in this case), the branch is already removed from the instruction stream and execution continues as if there were never a branch in the original sequence. when a conditional branch cannot be resolved due to a cr data dependency, the branch is executed by means of static branch prediction, and instruction fetching proceeds down the predicted path. if the branch prediction was incorrect when the branch is resolved, the instruction queue and all subsequently executed instructions are purged, instructions executed prior to the predicted branch are allowed to complete, and instruction fetching resumes down the correct path. there are several situations where instruction sequences create dependencies that prevent a branch instruction from being resolved immediately, thereby causing execution of the subsequent instruction stream based on the predicted outcome of the branch instruction. the instruction sequences, and the resulting action of the branch instruction is described as follows: an mtspr (lk) followed by a bclr ?etching is stopped, and the branch waits for the mtspr to execute. an mtspr (ctr) followed by a bcctr ?etching is stopped, and the branch waits for the mtspr to execute. an mtspr (ctr) followed by a bc (ctr)?etching is stopped, and the branch waits for the mtspr to execute. ? bc (ctr) followed by another bc (ctr)?etching is stopped, and the second branch waits for the ?st branch to be completed. ? bc (ctr) followed by a bcctr ?etching is stopped, and the bcctr waits for the ?st branch to be completed. a branch(lk = 1) followed by a branch(lk = 1)?etching is stopped, and the second branch waits for the ?st branch to be completed. (note: a bl instruction does not have to wait for a branch(lk = 1) to complete.) ? bc (based-on-cr) waiting for resolution due to a cr-dependency followed by a bc (based-on-cr)?etching is stopped and the second branch waits for the ?st cr- dependency to be resolved. (note: branch conditions can be a function of the ctr and the cr; if the ctr condition is suf?ient to resolve the branch, then a cr- dependency is ignored.)
6-16 mpc603e & EC603E risc microprocessors user's manual motorola 6.4.1.2 static branch prediction static branch prediction is a mechanism by which software (for example, compilers) can give a hint to the machine hardware about the direction the branch is likely to take. when a branch instruction encounters a data dependency, the bpu waits for the required condition code to become available. rather than stalling instruction issue until the source operand is ready, the 603e predicts which path the branch instruction is likely to take, and instructions are fetched and executed along that path. when the branch operand becomes available, the branch is evaluated. if the predicted path was correct, program ?w continues along that path uninterrupted; otherwise, the processor backs up, and program ?w resumes along the correct path. there is a scenario where a ?w control instruction will not be predicted on the 603e. if the target address of the branch (link or count register) will be modi?d by an instruction that appears before the branch instruction, the bpu must wait until the target address is available. the 603e executes through one level of prediction. the microprocessor may not predict a branch if a prior branch instruction is still unresolved. the number of instructions that can be executed after the issue of a predicted branch instruction is limited by the fact that no instruction executed after a predicted branch may actually update the register ?es or memory until the branch is completed. that is, instructions may be issued and executed, but may not reach the write-back stage in the completion unit. when an instruction following a predicted branch has completed execution, it will not be moved into the write-back stage, instead, it will simply stall in the last stage of the completion unit. this means that the completion queue may become full, which will limit the number of additional instructions that may be issued subsequent to an unresolved predicted branch. in the case of a misprediction, the 603e is able to redirect its machine state rather painlessly because the programing model has not been updated. when a branch is found to be mispredicted, all instructions that were issued subsequent to the predicted branch instruction are simply ?shed from the completion queue, and their results ?shed from the rename registers. no architected register state needs to be restored because no architected register state was modi?d by the instructions following the unresolved predicted branch. 6.4.1.2.1 predicted branch timing examples figure 6-5 depicts the cases where branch instructions are predicted, and shows both ?aken?and ?ot taken?branch outcomes. during clock cycle 0, two instructions are dispatched to their respective execution units. notice that the bpu has a combined decode/execute stage, thus the branch (instruction 1) is predicted not to be taken during clock cycle 1 because its source register (condition register) is not available. during clock cycle 2, instructions 0 and 2 progress through their pipelines. in addition, the branch (instruction 1) remains predicted. notice that the next branch instruction (instruction 5) is not able to begin its decode/execute phase while instruction 1 is predicted.
motorola chapter 6. instruction timing 6-17 during clock cycle 3, instruction 0 begins its write-back stage. the write back of instruction 0 resolves the data dependency for the ?st branch (instruction 1); thus the ?st branch becomes resolved and it is determined that the prediction was correct. recall that only one branch may be predicted at a time; thus, when instruction 1 is resolved the bpu is free to predict instruction 5. during clock 4, the second branch instruction remains predicted while additional instructions move through the various pipelines. during clock cycle 5, the bpu realizes that the prediction made for instruction 5 was incorrect. note that since instruction 6 was issued and executed conditionally, it never performed its write back. as a result of the misprediction, all instructions that followed the branch in the instruction stream must be ?shed from the respective execution unit pipelines. notice that instructions 6 and 7 do not continue execution since it has been determined that these instructions should have never been issued in the ?st place. since the branch has been resolved, a request is sent to the on-chip cache for the new instruction stream (based on the execution of instruction 5). during clock 6, the new set of instructions are in the iq and the appropriate dispatching begins on clock cycle 7. figure 6-5. branch instruction timing 12 3456 0 fetch dispatch execute write back predicted 7 1 bc 0 add 2 fadd 3 add 4 fadd 5 bc 6 add 7 fadd 8 and 9 fsub 10 or 11 fsub deallocate 8910
6-18 mpc603e & EC603E risc microprocessors user's manual motorola 6.4.2 integer unit execution timing the integer unit executes all integer and bit-?ld instructions. many of these instructions execute in a single clock cycle. the integer unit has one execute phase in its pipeline, thus when a multicycle integer instruction is being executed, no other integer instructions may begin an execute phase. refer to table 6-4 for integer instruction execution timing. 6.4.3 floating-point unit execution timing the ?ating-point unit on the 603e (not supported on the EC603E microprocessor) executes all ?ating-point instructions. execution of most ?ating-point instructions is pipelined within the fpu, allowing up to three instructions to be executing in the fpu concurrently. while most ?ating-point instructions execute with three- or four-cycle latency, and one- or two-cycle throughput, three instructions ( fdivs , fdiv , and fres ) execute with latencies of 18 to 33 cycles. the fdivs , fdiv , fres , mtfsb0 , mtfsb1 , mtfs , mffs , and mtfsf instructions block the ?ating-point unit pipeline until they complete execution, and thereby inhibit the dispatch of additional ?ating-point instructions. with the exception of the mcrfs instruction, all ?ating-point instructions will immediately forward their cr results to the bpu for fast branch resolution without waiting for the instruction to be retired by the completion unit, and the cr updated. refer to table 6-5 for ?ating-point instruction execution timing. 6.4.4 load/store unit execution timing the execution of most load and store instructions is pipelined. the lsu has two pipeline stages; the ?st stage is for effective address calculation and mmu translation, and the second stage is for accessing the data in the cache. load and store instructions have a two- cycle latency and one-cycle throughput. load instructions that miss in the cache block subsequent accesses to the cache while the cache line re?l is in process. refer to table 6-6 for load and store instruction execution timing. 6.4.5 system register unit execution timing the majority of the instructions executed by the sru access or modify nonrenamed registers, or directly access renamed registers, and generally execute in a serial manner. results from these instructions will not be available or forwarded for use by subsequent instructions until the instruction completes and is retired. the sru can also execute the integer instructions addi , addis , add , addo , cmpi , cmp , cmpli , and cmpl without serialization, and in parallel with another integer instruction. refer to section 6.3.3.2, ?nstruction serialization,?for additional information on serializing instructions executed by the sru, and table 6-2, table 6-3, and table 6-4 for sru instruction execution timing. 6.5 memory performance considerations due to the 603es instruction throughput of three instructions per clock cycle, lack of data bandwidth can become a performance bottleneck. in order for the 603e to approach its potential performance levels, it must be able to read and write data quickly and ef?iently.
motorola chapter 6. instruction timing 6-19 if there are many processors in a system environment, one processor may experience long memory latencies while another bus master (for example, a direct memory access controller) is using the external bus. in order to alleviate this possible contention, the 603e provides three memory update modes?opy-back, write-through, and cache-inhibit. each page of memory is speci?d to be in one of these modes. if a page is in copy-back mode, data being stored to that page is written only to the on-chip cache. if a page is in write-through mode, writes to that page update the on-chip cache on hits and always update main memory. if a page is cache- inhibited, data in that page will never be stored in the on-chip cache. all three of these modes of operation have advantages and disadvantages. a decision as to which mode to use depends on the system environment as well as the application. this section describes how performance is impacted by each memory update mode. for details about the operation of the on-chip cache and the memory update modes, see chapter 3, ?nstruction and data cache operation. 6.5.1 copy-back mode when storing data while in copy-back mode, store operations for cacheable data do not necessarily cause an external bus cycle to update memory. instead, memory updates only occur on modi?d line replacements, cache ?shes, or when another processor attempts to access a speci? address for which there is a corresponding modi?d cache entry. for this reason, copy-back mode may be preferred when external bus bandwidth is a potential bottleneck?or example, in a multiprocessor environment. copy-back mode is also well suited for data that is closely coupled to a processor, such as local variables. if more than one device uses data stored in a page that is in copy-back mode, snooping must be enabled to allow copy-back operations and cache invalidations of modi?d data. the 603e implements snooping hardware to prevent other devices from accessing invalid data. when bus snooping is enabled, the processor monitors the transactions of the other devices. for example, if another device accesses a memory location and its memory-coherent (m) bit is set, and the 603es on-chip cache has a modi?d value for that address, the processor preempts the bus transaction, and updates memory with the cache data. if the cache contents associated with the snooped address are unmodi?d, the 603e will invalidate the cache block. the other device is then free to attempt an access to the updated memory address. see chapter 3, ?nstruction and data cache operation,?for complete information about bus snooping. copy-back mode provides complete cache/memory coherency as well as maximizing available external bus bandwidth. 6.5.2 write-through mode store operations to memory in write-through mode always update memory as well as the on-chip cache (on cache hits). write-through mode is used when the data in the cache must always agree with external memory (for example, video memory), or when there is shared
6-20 mpc603e & EC603E risc microprocessors user's manual motorola (global) data that may be used frequently, or when allocation of a cache line on a cache miss is undesirable. automatic copy back of cached data is not performed if that data is from a memory page marked as write-through mode since valid cache data always agrees with memory. stores to memory that are in write-through mode may cause a decrease in performance. each time a store is performed to memory in write-through mode, the bus will be busy for the extra clock cycles required to perform the memory update; therefore, load operations that miss the on-chip cache must wait while the external store operation completes. 6.5.3 cache-inhibited accesses if a memory page is speci?d to be cache-inhibited, data from this page will not be stored in the on-chip cache. areas of the memory map can be cache-inhibited by the operating system software. if a cache-inhibited access hits in the on-chip cache, the corresponding cache line is invalidated. if the line is marked as modi?d, it is copied back to memory before being invalidated. in summary, the copy-back mode allows both load and store operations to use the on-chip cache. the write-through mode allows load operations to use the on-chip cache, but store operations cause a memory access and a cache update if the data is already in the cache. lastly, the cache-inhibited mode causes memory access for both loads and stores. 6.6 instruction scheduling guidelines the performance of the 603e can be improved by avoiding resource con?cts and promoting parallel utilization of execution units through ef?ient instruction scheduling. instruction scheduling on the 603e can be improved by observing the following guidelines: implement good static branch prediction (setting of y bit in bo ?ld). when branch prediction is uncertain, or an even probability, predict fall through. to reduce mispredictions, separate the instruction that sets cr bits from the branch instruction that evaluates them; separation by more than nine instructions ensures that the cr bits will be immediately available for evaluation. when branching conditionally to a location speci?d by count registers (ctrs) or link registers (lrs), or when branching conditionally based on the value in the count register, separate the mtspr instruction that initializes the ctr or lr from the branch instruction performing the evaluation. separation of the branch instruction and the mtspr instruction by more than nine instructions ensures the register values will be immediately available for use by the branch instruction. schedule instructions such that they can dual issue. schedule instructions to minimize execution-unit-busy stalls.
motorola chapter 6. instruction timing 6-21 avoid using serializing instructions. schedule instructions to avoid dispatch stalls due to renamed resource limitations. only ?e instructions can be in execute-complete stage at any one time only ?e gpr destinations can be in execute-complete-deallocate stage at any one time. note that load with update address instructions use two destination registers. only four fpr destinations can be in execute-complete-deallocate stage at any one time. (not supported on the EC603E microprocessor) 6.6.1 branch, dispatch, and completion unit resource requirements this section describes the speci? resources required to avoid stalls during branch resolution, instruction dispatching, and instruction completion. 6.6.1.1 branch resolution resource requirements the following is a list of branch instructions and the resources required to avoid stalling the fetch unit in the course of branch resolution: the bclr instruction requires lr availability. the bcctr instruction requires ctr availability. ?ranch and link?instructions require shadow lr availability. the ?ranch conditional on counter decrement and cr condition?requires ctr availability or the cr condition must be false, and 603e cannot be executing instructions following an unresolved predicted branch when the branch is encountered by the bpu. the ?ranch conditional on cr condition?cannot be executed following an unresolved predicted branch instruction. 6.6.1.2 dispatch unit resource requirements the following is a list of resources required to avoid stalls in the dispatch unit; note that the two dispatch buffers are described as dq[0] and dq[1], where dq[0] is the dispatch buffer located at the very bottom of the dispatch queue: requirements for dispatching from dq[0] are as follows: needed execution unit available needed gpr rename register(s) available needed fpr rename registers available (not supported on the EC603E microprocessor) completion buffer is not full instruction is dispatch serialized and completion buffer is empty a dispatch serialized instruction is not currently being executed
6-22 mpc603e & EC603E risc microprocessors user's manual motorola requirements for dispatching from dq[1] are as follows: instruction in dq[0] must dispatch instruction dispatched by dq[0] is not dispatch serialized needed execution unit is available (after dispatch from dq[0]) needed gpr rename registers(s) are available (after dispatch from dq[0]) needed fpr rename register is available (after dispatch from dq[0]) (not supported on the EC603E microprocessor) completion buffer is not full (after dispatch from dq[0]) instruction dispatched from dq[1] is not dispatch serialized 6.6.1.3 completion unit resource requirements the following is a list of resources required to avoid stalls in the completion unit; note that the two completion buffers are described as cq[0] and cq[1], where cq[0] is the completion buffer located at the very end of the completion queue: requirements for completing an instruction from cq[0] are as follows: instruction in cq[0] must be ?ished instruction in cq[0] must not follow an unresolved predicted branch instruction in cq[0] must not cause an exception requirements for completing an instruction from cq[1] are as follows: instruction in cq[0] must complete in same cycle instruction in cq[1] must be ?ished instruction in cq[1] must not follow an unresolved predicted branch instruction in cq[1] must not cause an exception instruction in cq[1] must be an integer or load instruction number of cr updates from both cq[0] and cq[1] must not exceed one number of gpr updates from both cq[0] and cq[1] must not exceed two number of fpr updates from both cq[0] and cq[1] must not exceed one (not supported on the EC603E microprocessor) 6.7 instruction latency summary table 6-1 through table 6-6 list the latencies associated with each instruction executed by the 603e. note that the instruction latency tables contain no 64-bit architected instructions. these instructions will trap to an illegal instruction exception handler when encountered. recall that the term latency is de?ed as the total time it takes to execute an instruction and make ready the results of that instruction.
motorola chapter 6. instruction timing 6-23 table 6-1 provides the latencies for the branch instructions. table 6-2 provides the latencies for the system register instructions. table 6-1. branch instructions primary extended mnemonic unit cycles 16 --- bc [l][a] bpu 1* 18 --- b [l][a] bpu 1* 19 016 bclr [l] bpu 1* 19 528 bcctr [l] bpu 1* *these operations may be folded for an effective cycle time of 0. table 6-2. system register instructions primary extended mnemonic unit cycles 17 - -1 sc sru 3 19 050 r sru 3 19 150 isync sru 1& 31 083 mfmsr sru 1 31 146 mtmsr sru 2 31 210 mtsr sru 2 31 242 mtsrin sru 2 31 339 mfspr (not i/dbats) sru 1 31 339 mfspr (dbats) sru 3& 31 339 mfspr (ibats) sru 3& 31 467 mtspr (not ibats) sru 2 (xer-&) 31 467 mtspr (ibats) sru 2& 31 595 mfsr sru 3& 31 598 sync sru 1& 31 659 mfsrin sru 3& 31 854 eieio sru 1 31 371 mftb sru 1 31 467 mttb sru 1 note : cycle times marked with ??require a variable number of cycles due to serialization.
6-24 mpc603e & EC603E risc microprocessors user's manual motorola table 6-3 provides the latencies for the condition register logical instructions. table 6-4 provides the latencies for the integer instructions. table 6-3. condition register logical instructions primary extended mnemonic unit cycles 19 000 mcrf sru 1 19 033 crnor sru 1 19 129 crandc sru 1 19 193 crxor sru 1 19 225 crnand sru 1 19 257 crand sru 1 19 289 creqv sru 1 19 417 crorc sru 1 19 449 cror sru 1 31 019 mfcr sru 1 31 144 mtcrf sru 1 31 512 mcrxr sru 1& note : cycle times marked with ??require a variable number of cycles due to serialization. table 6-4. integer instructions primary extended mnemonic unit cycles 03 twi integer 2 07 mulli integer 2,3 08 sub? integer 1 10 cmpli integer & sru 1^ 11 cmpi integer & sru 1^ 12 addic integer 1 13 addic. integer 1 14 addi integer & sru 1 15 addis integer & sru 1 20 rlwimi [.] integer 1 21 rlwinm [.] integer 1
motorola chapter 6. instruction timing 6-25 23 rlwnm [.] integer 1 24 ori integer 1 25 oris integer 1 26 xori integer 1 27 xoris integer 1 28 andi. integer 1 29 andis. integer 1 31 000 cmp integer & sru 1^ 31 004 tw integer 2 31 008 subfc [o][.] integer 1 31 010 addc [o][.] integer 1 31 011 mulhwu [.] integer 2,3,4,5,6 31 024 slw [.] integer 1 31 026 cntlzw [.] integer 1 31 028 and [.] integer 1 31 032 cmpl integer & sru 1^ 31 040 subf [.] integer 1 31 060 andc [.] integer 1 31 075 mulhw [.] integer 2,3,4,5 31 104 neg [o][.] integer 1 31 124 nor [.] integer 1 31 136 subfe [o][.] integer 1 31 138 adde [o][.] integer 1 31 200 subfze [o][.] integer 1 31 202 addze [o][.] integer 1 31 232 subfme [o][.] integer 1 31 234 addme [o][.] integer 1 31 235 mull [o][.] integer 2,3,4,5 31 266 add [o][.] integer & sru 1 1 31 284 eqv [.] integer 1 31 316 xor [.] integer 1 table 6-4. integer instructions (continued) primary extended mnemonic unit cycles
6-26 mpc603e & EC603E risc microprocessors user's manual motorola table 6-5 provides the latencies for the ?ating-point instructions. note that ?ating-point instructions are not supported on the EC603E microprocessor and execution of a ?ating- point instruction will result in a trap to the ?ating-point unavailable exception vector. 31 412 orc [.] integer 1 31 444 or [.] integer 1 31 459 divwu [o][.] integer 37 31 476 nand [.] integer 1 31 491 divw [o][.] integer 37 31 536 srw [.] integer 1 31 792 sraw [.] integer 1 31 824 srawi [.] integer 1 31 922 extsh [.] integer 1 31 954 extsb [.] integer 1 notes : ??indicates that the cycle time immediately forwards their cr results to the bpu for fast branch resolution. 1. the sru can only execute the add and add[o] instructions. table 6-5. floating-point instructions primary extended mnemonic unit cycles 59 018 fdivs [.] fpu 18^ 59 020 fsubs [.] fpu 1-1-1^ 59 021 fadds [.] fpu 1-1-1^ 59 024 fres [.] fpu 18^ 59 025 fmuls [.] fpu 1-1-1^ 59 028 fmsubs [.] fpu 1-1-1^ 59 029 fmadds [.] fpu 1-1-1^ 59 030 fnmsubs [.] fpu 1-1-1^ 59 031 fnmadds [.] fpu 1-1-1^ 63 000 fcmpu fpu 1-1-1^ 63 012 frsp [.] fpu 1-1-1^ 63 014 fctiw [.] fpu 1-1-1^ 63 015 fctiwz [.] fpu 1-1-1^ 63 018 fdiv [.] fpu 33^ table 6-4. integer instructions (continued) primary extended mnemonic unit cycles
motorola chapter 6. instruction timing 6-27 63 020 fsub [.] fpu 1-1-1^ 63 021 fadd [.] fpu 1-1-1^ 63 023 fsel [.] fpu 1-1-1^ 63 025 fmul [.] fpu 2-1-1^ 63 026 frsqrte [.] fpu 1-1-1^ 63 028 fmsub [.] fpu 2-1-1^ 63 029 fmadd [.] fpu 2-1-1^ 63 030 fnmsub [.] fpu 2-1-1^ 63 031 fnmadd [.] fpu 2-1-1^ 63 032 fcmpo fpu 1-1-1^ 63 038 mtfsb1 [.] fpu 1-1-1&^ 63 040 fneg [.] fpu 1-1-1^ 63 064 mcrfs fpu 1-1-1& 63 070 mtfsb0 [.] fpu 1-1-1&^ 63 072 fmr [.] fpu 1-1-1^ 63 134 mtfs [.] fpu 1 1 1&^ 63 136 fnabs [.] fpu 1-1-1^ 63 264 fabs [.] fpu 1-1-1^ 63 583 mffs [.] fpu 1-1-1&^ 63 711 mtfsf [.] fpu 1-1-1&^ notes : cycle times marked with ??require a variable number of cycles due to completion serialization. cycle times marked with ??immediately forward their cr results to the bpu for fast branch resolution. cycle times marked with a ? specify the number of clock cycles in each pipeline stage. instructions with a single entry in the cycles column are not pipelined. table 6-5. floating-point instructions (continued) primary extended mnemonic unit cycles
6-28 mpc603e & EC603E risc microprocessors user's manual motorola table 6-6 provides latencies for the load and store instructions. table 6-6. load and store instructions primary extended mnemonic unit cycles 31 020 lwarx lsu 2:1 31 023 lwzx lsu 2:1 31 054 dcbst lsu 2/5& 31 055 lwzux lsu 2:1 31 086 dcbf lsu 2/5& 31 087 lbzx lsu 2:1 31 119 lbzux lsu 2:1 31 150 stwcx. lsu 8 31 151 stwx lsu 2:1 31 183 stwux lsu 2:1 31 215 stbx lsu 2:1 31 246 dcbtst lsu 2 31 247 stbux lsu 2:1 31 278 dcbt lsu 2 31 279 lhzx lsu 2:1 31 306 tlbie lsu 3& 31 310 eciwx lsu 2:1 31 311 lhzux lsu 2:1 31 343 lhax lsu 2:1 31 375 lhaux lsu 2:1 31 407 sthx lsu 2:1 31 438 ecowx lsu 2:1 31 439 sthux lsu 2:1 31 470 dcbi lsu 2& 31 533 lswx lsu 2 + n& 31 534 lwbrx lsu 2:1 31 535 lfsx lsu 2:1 31 566 tlbsync lsu 2& 31 567 lfsux lsu 2:1 31 597 lswi lsu 2 + n& 31 599 lfdx lsu 2:1
motorola chapter 6. instruction timing 6-29 31 631 lfdux lsu 2:1 31 661 stswx lsu 1 + n& 31 662 stwbrx lsu 2:1 31 663 stfsx lsu 2:1 31 695 stfsux lsu 2:1 31 725 stswi lsu 1 + n& 31 727 stfdx lsu 2:1 31 759 stfdux lsu 2:1 31 790 lhbrx lsu 2:1 31 918 sthbrx lsu 2:1 31 978 tlbld lsu 2& 31 982 icbi lsu 3& 31 983 st?x lsu 2:1 31 1010 tlbli lsu 3& 31 1014 dcbz lsu 10& 32 --- lwz lsu 2:1 33 --- lwzu lsu 2:1 34 --- lbz lsu 2:1 35 --- lbzu lsu 2:1 36 --- stw lsu 2:1 37 --- stwu lsu 2:1 38 --- stb lsu 2:1 39 --- stbu lsu 2:1 40 --- lhz lsu 2:1 41 --- lhzu lsu 2:1 42 --- lha lsu 2:1 43 --- lhau lsu 2:1 44 --- sth lsu 2:1 45 --- sthu lsu 2:1 46 --- lmw lsu 2 + n& 47 --- stmw lsu 1 + n& 48 --- lfs lsu 2:1 table 6-6. load and store instructions (continued) primary extended mnemonic unit cycles
6-30 mpc603e & EC603E risc microprocessors user's manual motorola 49 --- lfsu lsu 2:1 50 --- lfd lsu 2:1 51 --- lfdu lsu 2:1 52 --- stfs lsu 2:1 53 --- stfsu lsu 2:1 54 --- stfd lsu 2:1 55 --- stfdu lsu 2:1 notes : cycle times marked with ??require a variable number of cycles due to serialization. cycle times marked with a ??pecify hit and miss times for cache management instructions that require conditional bus activity. cycle times marked with a ? specify cycles of total latency and throughput for pipelined load and store instructions. load and store multiple and string instruction cycles are shown as a ?ed number of cycles plus a variable number of cycles where ? is the number of words accessed by the instruction. table 6-6. load and store instructions (continued) primary extended mnemonic unit cycles
motorola chapter 7. signal descriptions 7-1 chapter 7 signal descriptions 70 70 this chapter describes the powerpc 603e microprocessors external signals. it contains a concise description of individual signals, showing behavior when the signal is asserted and negated and when the signal is an input and an output. note a bar over a signal name indicates that the signal is active low?or example, ar tr y (address retry) and ts (transfer start). active-low signals are referred to as asserted (active) when they are low and negated when they are high. signals that are not active-low, such as ap[0?] (address bus parity signals) and tt[0?] (transfer type signals) are referred to as asserted when they are high and negated when they are low. the 603e signals are grouped as follows: address arbitration signals?he 603e uses these signals to arbitrate for address bus mastership. address transfer start signals?hese signals indicate that a bus master has begun a transaction on the address bus. address transfer signals?hese signals, which consist of the address bus, address parity, and address parity error signals, are used to transfer the address and to ensure the integrity of the transfer. transfer attribute signals?hese signals provide information about the type of transfer, such as the transfer size and whether the transaction is bursted, write- through, or cache-inhibited. address transfer termination signals?hese signals are used to acknowledge the end of the address phase of the transaction. they also indicate whether a condition exists that requires the address phase to be repeated. data arbitration signals?he 603e uses these signals to arbitrate for data bus mastership. data transfer signals?hese signals, which consist of the data bus, data parity, and data parity error signals, are used to transfer the data and to ensure the integrity of the transfer.
7-2 mpc603e & EC603E risc microprocessors user's manual motorola data transfer termination signals?ata termination signals are required after each data beat in a data transfer. in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the ?al data beat. they also indicate whether a condition exists that requires the data phase to be repeated. system status signals?hese signals include the external interrupt signal, checkstop signals, and both soft- and hard-reset signals. these signals are used to interrupt and, under various conditions, to reset the processor. jtag/cop interface signals?he jtag (ieee 1149.1) interface and common on- chip processor (cop) unit provides a serial interface to the system for performing monitoring and boundary tests. processor status?hese signals include the memory reservation signal, machine quiesce control signals, time base enable signal, and tlbisync signal. clock signals?hese signals provide for system clock input and frequency control.
motorola chapter 7. signal descriptions 7-3 7.1 signal con?uration figure 7-1 illustrates the 603e microprocessors signal con?uration, showing how the signals are grouped. note a pinout showing actual pin numbers is included in the 603e hardware speci?ations. figure 7-1. signal groups br bg ab b ts a[0?1] ap[0?] ape tt[0?] tbst tsiz[0?] gbl ci wt cse[0?] tc[0?] aack artry sysclk clk_out pll_cfg[0?] 1 1 1 1 32 4 1 5 1 3 1 1 1 2 2 1 1 1 1 4 1 1 1 64 8 1 1 1 1 1 2 1 2 2 1 2 1 1 5 3 603e dbg dbwo dbb dh[0?1], dl[0?1] dp[0?] dpe dbdis ta drtry tea int , smi mcp ckstp _in , c kstp_out hreset , sreset rsrv qreq , qack tben tlbisync trst , tck, tms, tdi, td0 test address arbitration address start address bus transfer attribute address termination clocks data arbitration data transfer data termination interrupts checkstops reset processor status jtag/cop interface lssd test control +3.3 v
7-4 mpc603e & EC603E risc microprocessors user's manual motorola 7.2 signal descriptions this section describes individual 603e signals, grouped according to figure 7-1. note that the following sections are intended to provide a quick summary of signal functions. chapter 8, ?ystem interface operation,?describes many of these signals in greater detail, both with respect to how individual signals function and how groups of signals interact. 7.2.1 address bus arbitration signals the address arbitration signals are a collection of input and output signals the 603e uses to request the address bus, recognize when the request is granted, and indicate to other devices when mastership is granted. for a detailed description of how these signals interact, see section 8.3.1, address bus arbitration. 7.2.1.1 bus request (br )?utput the bus request (br ) signal is an output signal on the 603e. following are the state meaning and timing comments for the br signal. state meaning asserted?ndicates that the 603e is requesting mastership of the address bus. note that br may be asserted for one or more cycles, and then de-asserted due to an internal cancellation of the bus request (for example, due to a load hit in the touch load buffer). see section 8.3.1, address bus arbitration. negated?ndicates that the 603e is not requesting the address bus. the 603e may have no bus operation pending, it may be parked, or the ar tr y input was asserted on the previous bus clock cycle. timing comments assertion?ccurs when the 603e is not parked and a bus transaction is needed. this may occur even if the two possible pipeline accesses have occurred. br will also be asserted for one cycle during the execution of a dcbz instruction, and during the execution of a load instruction which hits in the touch load buffer. negation?ccurs for at least one bus clock cycle after an accepted, quali?d bus grant (see bg and abb ), even if another transaction is pending. it is also negated for at least one bus clock cycle when the assertion of ar tr y is detected on the bus.
motorola chapter 7. signal descriptions 7-5 7.2.1.2 bus grant (bg )?nput the bus grant (bg ) signal is an input signal on the 603e. following are the state meaning and timing comments for the bg signal. state meaning asserted?ndicates that the 603e may, with the proper quali?ation, assume mastership of the address bus. a quali?d bus grant occurs when bg is asserted and abb and artry (after aa ck ) are not asserted. the abb and ar tr y signals are driven by the 603e or other bus masters. if the 603e is parked, br need not be asserted for the quali?d bus grant. see section 8.3.1, address bus arbitration. negated?indicates that the 603e is not the next potential address bus master. timing comments assertion?ay occur at any time to indicate the 603e is free to use the address bus. after the 603e assumes bus mastership, it does not check for a quali?d bus grant again until the cycle during which the address bus tenure is completed (assuming it has another transaction to run). the 603e does not accept a bg in the cycles between the assertion of any ts and aa ck . negation?ay occur at any time to indicate the 603e cannot use the bus. the 603e may still assume bus mastership on the bus clock cycle of the negation of bg because during the previous cycle bg indicated to the 603e that it was free to take mastership (if quali?d). 7.2.1.3 address bus busy (abb ) the address bus busy (abb ) signal is both an input and an output signal. 7.2.1.3.1 address bus busy (abb )?utput following are the state meaning and timing comments for the abb output signal. state meaning asserted?ndicates that the 603e is the address bus master. see section 8.3.1, address bus arbitration. negated?ndicates that the 603e is not using the address bus. if abb is negated during the bus clock cycle following a quali?d bus grant, the 603e did not accept mastership, even if br was asserted. this can occur if a potential transaction is aborted internally before the transaction is started. timing comments assertion?ccurs on the bus clock cycle following a quali?d bg that is accepted by the processor (see negated). negation?ccurs for a minimum of one-half bus clock cycle following the assertion of aa ck . if abb is negated during the bus clock cycle following a quali?d bus grant, the 603e did not accept mastership, even if br was asserted. high impedance?ccurs after abb is negated.
7-6 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.1.3.2 address bus busy (abb )?nput following are the state meaning and timing comments for the abb input signal. state meaning asserted?ndicates that the address bus is in use. this condition effectively blocks the 603e from assuming address bus ownership, regardless of the bg input; see section 8.3.1, address bus arbitration. negated?ndicates that the address bus is not owned by another bus master and that it is available to the 603e when accompanied by a quali?d bus grant. timing comments assertion?ay occur when the 603e must be prevented from using the address bus (and the processor is not currently asserting abb ). negation?ay occur whenever the 603e can use the address bus. 7.2.2 address transfer start signals address transfer start signals are input and output signals that indicate that an address bus transfer has begun. the transfer start (ts ) signal identi?s the operation as a memory transaction. for detailed information about how ts interacts with other signals, refer to section 8.3.2, address transfer.? 7.2.2.1 transfer start (ts ) the ts signal is both an input and an output signal on the 603e. 7.2.2.1.1 transfer start (ts )?utput following are the state meaning and timing comments for the ts output signal. state meaning asserted?ndicates that the 603e has begun a memory bus transaction and that the address bus and transfer attribute signals are valid. when asserted with the appropriate tt[0?] signals it is also an implied data bus request for a memory transaction (unless it is an address-only operation). negated?ndicates that no bus transaction is occurring during normal operation. timing comments assertion?oincides with the assertion of abb . negation?ccurs one bus clock cycle after ts is asserted. high impedance?oincides with the negation of abb .
motorola chapter 7. signal descriptions 7-7 7.2.2.1.2 transfer start (ts )?nput following are the state meaning and timing comments for the ts input signal. state meaning asserted?ndicates that another master has begun a bus transaction and that the address bus and transfer attribute signals are valid for snooping (see gbl ). negated?ndicates that no bus transaction is occurring. timing comments assertion?ay occur during the assertion of abb . negation?ust occur one bus clock cycle after ts is asserted. 7.2.3 address transfer signals the address transfer signals are used to transmit the address and to generate and monitor parity for the address transfer. for a detailed description of how these signals interact, refer to section 8.3.2, address transfer.? 7.2.3.1 address bus (a[0?1]) the address bus (a[0?1]) consists of 32 signals that are both input and output signals. 7.2.3.1.1 address bus (a[0?1])?utput following are the state meaning and timing comments for the a[0?1] output signals. state meaning asserted/negated?epresents the physical address (real address in the architecture speci?ation) of the data to be transferred. on burst transfers, the address bus presents the double-word?ligned address containing the critical code/data that missed the cache on a read operation, or the ?st double word of the cache line on a write operation. note that the address output during burst operations is not incremented. see section 8.3.2, address transfer. timing comments assertion/negation?ccurs on the bus clock cycle after a quali?d bus grant (coincides with assertion of abb and t s ). high impedance?ccurs one bus clock cycle after aa ck is asserted. 7.2.3.1.2 address bus (a[0?1])?nput following are the state meaning and timing comments for the a[0?1] input signals. state meaning asserted/negated?epresents the physical address of a snoop operation. timing comments assertion/negation?ust occur on the same bus clock cycle as the assertion of t s ; is sampled by 603e only on this cycle.
7-8 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.3.2 address bus parity (ap[0?]) the address bus parity (ap[0?]) signals are both input and output signals re?cting one bit of odd-byte parity for each of the 4 bytes of address when a valid address is on the bus. 7.2.3.2.1 address bus parity (ap[0?])?utput following are the state meaning and timing comments for the ap[0?] output signal on the 603e. state meaning asserted/negated?epresents odd parity for each of 4 bytes of the physical address for a transaction. odd parity means that an odd number of bits, including the parity bit, are driven high. the signal assignments correspond to the following: ap0 a[0?] ap1 a[8?5] ap2 a[16?3] ap3 a[24?1] for more information, see section 8.3.2.1, address bus parity. timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. 7.2.3.2.2 address bus parity (ap[0?])?nput following are the state meaning and timing comments for the ap[0?] input signal on the 603e. state meaning asserted/negated?epresents odd parity for each of 4 bytes of the physical address for snooping operations. detected even parity causes the processor to take a machine check exception or enter the checkstop state if address parity checking is enabled in the hid0 register; see section 2.1.2.1, ?ardware implementation registers (hid0 and hid1).?(see also the ape signal description.) timing comments assertion/negation?he same as a[0?1]. 7.2.3.3 address parity error (ape )?utput the address parity error (ape ) signal is an output signal on the 603e. note that the (ape ) signal is an open-drain type output, and requires an external pull-up resistor (for example, 10 k w to vdd) to assure proper de-assertion of the a pe signal. following are the state meaning and timing comments for the ape signal on the 603e. the ape signal will not be asserted if address parity checking is disabled (hid0[eba] cleared to 0). for more information, see section 8.3.2.1, address bus parity. state meaning asserted?ndicates incorrect address bus parity has been detected by the 603e on a snoop (gbl asserted). negated?ndicates that the 603e has not detected a parity error (even parity) on the address bus.
motorola chapter 7. signal descriptions 7-9 timing comments assertion?ccurs on the second bus clock cycle after ts is asserted. high impedance?ccurs on the third bus clock cycle after ts is asserted. 7.2.4 address transfer attribute signals the transfer attribute signals are a set of signals that further characterize the transfer?uch as the size of the transfer, whether it is a read or write operation, and whether it is a burst or single-beat transfer. for a detailed description of how these signals interact, see section 8.3.2, address transfer. note that some signal functions vary depending on whether the transaction is a memory access or an i/o access. 7.2.4.1 transfer type (tt[0?]) the transfer type (tt[0?]) signals consist of ?e input/output signals on the 603e. for a complete description of tt[0?] signals and for transfer type encodings, see table 7-1. 7.2.4.1.1 transfer type (tt[0?])?utput following are the state meaning and timing comments for the tt[0?] output signals on the 603e. state meaning asserted/negated?ndicates the type of transfer in progress. timing comments assertion/negation/high impedance?he same as a[0?1]. 7.2.4.1.2 transfer type (tt[0?])?nput following are the state meaning and timing comments for the tt[0?] input signals on the 603e. state meaning asserted/negated?ndicates the type of transfer in progress (see table 7-2). timing comments assertion/negation?he same as a[0?1]. table 7-1 describes the transfer encodings for a 603e bus master. table 7-1. transfer encoding for the bus master 603e bus master transaction transaction source tt0 tt1 tt2 tt3 tt4 60x bus speci?ation command transaction n/a n/a 00000 clean block address only n/a n/a 00100 flush block address only n/a n/a 01000 sync address only address only dcbz 01100 kill block address only n/a n/a 10000 eieio address only
7-10 mpc603e & EC603E risc microprocessors user's manual motorola single-beat write (nongbl ) ecowx 10100 external control word write single-beat write n/a n/a 11000 tlb invalidate address only single-beat read (nong bl ) eciwx 11100 external control word read single-beat read n/a n/a 00001 lwarx reservation set address only n/a n/a 00101 reserved n/a n/a 01001 tlbsync address only n/a n/a 01101 icbi address only n/a n/a 1 x x 0 1 reserved single-beat write caching- inhibited or write- through store 00010wr ite-with-?sh single-beat write or burst burst (nongbl ) cast-out, or snoop copyback 00110wr ite-with-kill single-beat write or burst single-beat read caching- inhibited load or instruction fetch 01010 read single-beat read or burst burst load miss, store miss, or instruction fetch 01110 read-with-intent- to-modify burst single-beat write stwcx. 10010wr ite-with-?sh- atomic single-beat write n/a n/a 10110 reserved n/a single-beat read lwarx (caching- inhibited load) 11010 read-atomic single-beat read or burst burst lwarx (load miss) 11110 read-with-intent- to-modify-atomic burst n/a n/a 00011 reserved n/a n/a 00111 reserved n/a n/a 01011 read-with-no- intent-to-cache single-beat read or burst n/a n/a 01111 reserved n/a n/a 1 x x 1 1 reserved table 7-1. transfer encoding for the bus master (continued) 603e bus master transaction transaction source tt0 tt1 tt2 tt3 tt4 60x bus speci?ation command transaction
motorola chapter 7. signal descriptions 7-11 table 7-2 describes the 60x bus speci?ation transfer encodings and the 603e bus snoop response on an address hit. table 7-2. snoop hit response 60x bus speci?ation command transaction tt0 tt1 tt2 tt3 tt4 603e bus snooper; action on hit clean block address only 00000n/a flush block address only 00100n/a sync address only 01000n/a kill block address only 01100 kill, cancel reservation eieio address only 10000n/a external control word write single-beat write 10100n/a tlb invalidate address only 11000n/a external control word read single-beat read 11100n/a lwarx reservation set address only 00001n/a reserved 00101n/a tlbsync address only 01001n/a icbi address only 01101n/a reserved 1 x x 0 1 n/a write-with-?sh single-beat write or burst 00010 flush, cancel reservation write-with-kill single-beat write or burst 00110 kill, cancel reservation read single-beat read or burst 01010 clean or ?sh read-with-intent-to-modify burst 01110 flush write-with-?sh-atomic single-beat write 10010 flush, cancel reservation reserved n/a 10110n/a read-atomic single-beat read or burst 11010 clean or ?sh read-with-intent-to modify- atomic burst 11110 flush reserved 00011n/a reserved 00111n/a read-with-no-intent-to-cache single-beat read or burst 01011 clean reserved 01111n/a reserved 1 x x 1 1 n/a
7-12 mpc603e & EC603E risc microprocessors user's manual motorola the 603e provides transfer type signals (tt[0?]) that characterize bus transfers. when hid0[abe] is set, the pid7v-603e performs address-only bus transactions with the encodings shown in table 7-3. the 603e provides a clk_out signal for test purposes that allows the monitoring of the processor and bus clock frequencies. the frequency of the clk_out signal is determined by the con?uration of the hid0[sbclk] and hid0[eclk] bits, as shown in table 7-4. note that the pid7v-603es clk_out signal will be driven at the processor frequency during the assertion of hreset ; when the hreset signal is deasserted, the clk_out signal enters the default high-impedance state. 7.2.4.2 transfer size (tsiz[0?])?utput the transfer size (tsiz[0?]) signals consist of three output signals on the 603e. following are the state meaning and timing comments for the tsiz[0?] output signals on the 603e. state meaning asserted/negated?or memory accesses, these signals along with tbst , indicate the data transfer size for the current bus operation, as shown in table 7-5. table 8-4 shows how the transfer size signals are used with the address signals for aligned transfers. table 8-5 shows how the transfer size signals are used with the address signals for misaligned transfers. for external control instructions ( eciwx and ecowx ), tsiz[0?] are used to output bits 29?1 of the external access register (ear), which are used to form the resource id (tbst ||tsiz[0?]). table 7-3. implementation-specific transfer encoding tt0 tt1 tt2 tt3 tt4 pid7v-603e transaction transaction transaction source 0 0 0 0 0 clean block address only dcbst 0 0 1 0 0 flush block address only dcbf 0 1 1 0 0 kill block address only dcbz , dcbi table 7-4. clk_out signal configuration hid0[sbclk] hid0[eclk] clk_out output state 0 0 high-impedance 0 1 processor clock frequency 1 0 half-bus clock frequency 1 1 bus clock frequency
motorola chapter 7. signal descriptions 7-13 timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. 7.2.4.3 transfer burst ( tbst ) the transfer burst (tbst ) signal is an input/output signal on the 603e. 7.2.4.3.1 transfer burst ( tbst )?utput following are the state meaning and timing comments for the tbst output signal. state meaning asserted?ndicates that a burst transfer is in progress. negated?ndicates that a burst transfer is not in progress. for external control instructions ( eciwx and ecowx ), tbs t is used to output bit 28 of the ear, which is used to form the resource id (tbst ||tsiz[0?]). timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. 7.2.4.3.2 transfer burst ( tbst )?nput following are the state meaning and timing comments for the tbst input signal. state meaning asserted/negated?sed when snooping for single-beat reads (read with no intent to cache). timing comments assertion/negation?he same as a[0?1]. table 7-5. data transfer size tbst tsiz[0?] transfer size asserted 010 burst (32 bytes) negated 000 8 bytes negated 001 1 byte negated 010 2 bytes negated 011 3 bytes negated 100 4 bytes negated 101 5 bytes negated 110 6 bytes negated 111 7 bytes
7-14 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.4.4 transfer code (tc[0?])?utput the transfer code (tc[0?]) consists of two output signals on the 603e. following are the state meaning and timing comments for the tc[0?] signals. state meaning asserted/negated?epresents a special encoding for the transfer in progress (see table 7-6). timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. 7.2.4.5 cache inhibit (ci )?utput the cache inhibit (ci ) signal is an output signal on the 603e. following are the state meaning and timing comments for the ci signal. state meaning asserted?ndicates that a single-beat transfer will not be cached, re?cting the setting of the i bit for the block or page that contains the address of the current transaction. negated?ndicates that a burst transfer will allocate a line in the 603e data cache. timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. 7.2.4.6 write-through (wt )?utput the write-through (wt ) signal is an output signal on the 603e. following are the state meaning and timing comments for the wt signal. state meaning asserted?ndicates that a single-beat transaction is write-through, re?cting the value of the w bit for the block or page that contains the address of the current transaction. negated?ndicates that a transaction is not write-through. timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. table 7-6. encodings for tc[0?] signals tc(0?) read write 0 0 data transaction any write 0 1 touch load 1 0 instruction fetch 1 1 reserved
motorola chapter 7. signal descriptions 7-15 7.2.4.7 global (gbl ) the global (gbl ) signal is an input/output signal on the 603e. 7.2.4.7.1 global (gbl )?utput following are the state meaning and timing comments for the gbl output signal. state meaning asserted?ndicates that a transaction is global, re?cting the setting of the m bit for the block or page that contains the address of the current transaction (except in the case of copy-back operations and instruction fetches, which are nonglobal.) negated?ndicates that a transaction is not global. timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. 7.2.4.7.2 global (gbl )?nput following are the state meaning and timing comments for the gbl input signal. state meaning asserted?ndicates that a transaction must be snooped by the 603e. negated?ndicates that a transaction is not snooped by the 603e. timing comments assertion/negation?he same as a[0?1]. 7.2.4.8 cache set entry (cse[0?])?utput following are the state meaning and timing comments for the cse[0?] signals. state meaning asserted/negated?epresents the cache replacement set element for the current transaction reloading into or writing out of the cache. can be used with the address bus and the transfer attribute signals to externally track the state of each cache line in the 603es cache. note that the cse[0?] signals are not meaningful during data cache touch load operations. timing comments assertion/negation?he same as a[0?1]. high impedance?he same as a[0?1]. 7.2.5 address transfer termination signals the address transfer termination signals are used to indicate either that the address phase of the transaction has completed successfully or must be repeated, and when it should be terminated. for detailed information about how these signals interact, see section 8.3.3, address transfer termination.?
7-16 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.5.1 address acknowledge (aa ck )?nput the address acknowledge (aa ck ) signal is an input signal (input-only) on the 603e. following are the state meaning and timing comments for the aa ck signal. state meaning asserted?ndicates that the address phase of a transaction is complete. the address bus will go to a high impedance state on the next bus clock cycle. the 603e samples ar tr y on the bus clock cycle following the assertion of aa ck . negated?during abb ) indicates that the address bus and the transfer attributes must remain driven. timing comments assertion?ay occur as early as the bus clock cycle after ts is asserted (unless 603e is con?ured for 1:1 or 1.5:1 clock modes, when aa ck can be asserted no sooner than the second cycle following the assertion of ts ?ne address wait state); assertion can be delayed to allow adequate address access time for slow devices. for example, if an implementation supports slow snooping devices, an external arbiter can postpone the assertion of aa ck . negation?ust occur one bus clock cycle after the assertion of aa ck . 7.2.5.2 address retry (ar tr y ) the address retry (ar tr y ) signal is both an input and output signal on the 603e. 7.2.5.2.1 address retry (ar tr y )?utput following are the state meaning and timing comments for the ar tr y output signal. state meaning asserted?ndicates that the 603e detects a condition in which a snooped address tenure must be retried. if the 603e needs to update memory as a result of the snoop that caused the retry, the 603e asserts b r the second cycle after aa ck if a r tr y is asserted. high impedance?ndicates that the 603e does not need the snooped address tenure to be retried. timing comments assertion?sserted the third bus cycle following the assertion of ts if a retry is required. negation?ccurs the second bus cycle after the assertion of aa ck . since this signal may be simultaneously driven by multiple devices, it negates in a unique fashion. first the buffer goes to high impedance for a minimum of one-half processor cycle (dependent on the clock mode), then it is driven negated for one bus cycle before returning to high impedance. this special method of negation may be disabled by setting precharge disable in hid0.
motorola chapter 7. signal descriptions 7-17 7.2.5.2.2 address retry (ar tr y )?nput following are the state meaning and timing comments for the ar tr y input signal. state meaning asserted?f the 603e is the address bus master, ar tr y indicates that the 603e must retry the preceding address tenure and immediately negate br (if asserted). if the associated data tenure has already started, the 603e will also abort the data tenure immediately, even if the burst data has been received. if the 603e is not the address bus master, this input indicates that the 603e should immediately negate br for one bus clock cycle following the assertion of ar tr y by the snooping bus master to allow an opportunity for a copy-back operation to main memory. note that the subsequent address presented on the address bus may not be the same one associated with the assertion of the artry signal. negated/high impedance?ndicates that the 603e does not need to retry the last address tenure. timing comments assertion?ay occur as early as the second cycle following the assertion of ts , and must occur by the bus clock cycle immediately following the assertion of aa c k if an address retry is required. negation?ust occur during the second cycle after the assertion of aa ck . 7.2.6 data bus arbitration signals like the address bus arbitration signals, data bus arbitration signals maintain an orderly process for determining data bus mastership. note that there is no data bus arbitration signal equivalent to the address bus arbitration signal br (bus request), because, except for address-only transactions, ts implies data bus requests. for a detailed description on how these signals interact, see section 8.4.1, ?ata bus arbitration.? one special signal, dbw o , allows the 603e to be con?ured dynamically to write data out of order with respect to read data. for detailed information about using dbw o , see section 8.10, ?sing data bus write only.? 7.2.6.1 data bus grant (dbg )?nput the data bus grant (dbg ) signal is an input signal (input-only) on the 603e. following are the state meaning and timing comments for the dbg signal. state meaning asserted?ndicates that the 603e may, with the proper quali?ation, assume mastership of the data bus. the 603e derives a quali?d data bus grant when dbg is asserted and dbb , dr tr y , and ar tr y are negated; that is, the data bus is not busy (dbb is negated), there is no outstanding attempt to retry the current data tenure (dr tr y is negated), and there is no outstanding attempt to perform an ar tr y of the associated address tenure. negated?ndicates that the 603e must hold off its data tenures.
7-18 mpc603e & EC603E risc microprocessors user's manual motorola timing comments assertion?ay occur any time to indicate the 603e is free to take data bus mastership. it is not sampled until ts is asserted. negation?ay occur at any time to indicate the 603e cannot assume data bus mastership. 7.2.6.2 data bus write only (dbw o )?nput the data bus write only (dbw o ) signal is an input signal (input-only) on the 603e. following are the state meaning and timing comments for the dbw o signal. state meaning asserted?ndicates that the 603e may run the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. refer to section 8.10, ?sing data bus write only,?for detailed instructions for using dbw o . negated?ndicates that the 603e must run the data bus tenures in the same order as the address tenures. timing comments assertion?ust occur no later than a quali?d dbg for an outstanding write tenure. dbw o is only recognized by the 603e on the clock of a quali?d db g . if no write requests are pending, the 603e will ignore dbw o and assume data bus ownership for the next pending read request. negation?ay occur any time after a quali?d dbg and before the next assertion of dbg . 7.2.6.3 data bus busy (dbb ) the data bus busy (dbb ) signal is both an input and output signal on the 603e. 7.2.6.3.1 data bus busy (dbb )?utput following are the state meaning and timing comments for the dbb output signal. state meaning asserted?ndicates that the 603e is the data bus master. the 603e always assumes data bus mastership if it needs the data bus and is given a quali?d data bus grant (see dbg ). negated?ndicates that the 603e is not using the data bus. timing comments assertion?ccurs during the bus clock cycle following a quali?d dbg . negation?ccurs for a minimum of one-half bus clock cycle (dependent on clock mode) following the assertion of the ?al t a . high impedance?ccurs after dbb is negated. 7.2.6.3.2 data bus busy (dbb )?nput following are the state meaning and timing comments for the dbb input signal. state meaning asserted?ndicates that another device is bus master. negated?ndicates that the data bus is free (with proper quali?ation, see dbg ) for use by the 603e.
motorola chapter 7. signal descriptions 7-19 timing comments assertion?ust occur when the 603e must be prevented from using the data bus. negation?ay occur whenever the data bus is available. 7.2.7 data transfer signals like the address transfer signals, the data transfer signals are used to transmit data and to generate and monitor parity for the data transfer. for a detailed description of how the data transfer signals interact, see section 8.4.3, ?ata transfer.? 7.2.7.1 data bus (dh[0?1], dl[0?1]) the data bus (dh[0?1] and dl[0?1]) consists of 64 signals that are both input and output on the 603e. following are the state meaning and timing comments for the dh and dl signals. state meaning the data bus has two halves?ata bus high (dh) and data bus low (dl). see table 7-7 for the data bus lane assignments. timing comments the data bus is driven once for noncached transactions and four times for cache transactions (bursts). 7.2.7.1.1 data bus (dh[0?1], dl[0?1])?utput following are the state meaning and timing comments for the dh and dl output signals. state meaning asserted/negated represents the state of data during a data write. byte lanes not selected for data transfer will not supply valid data. timing comments assertion/negation?nitial beat coincides with dbb and, for bursts, transitions on the bus clock cycle following each assertion of t a . high impedance?ccurs on the bus clock cycle after the ?al assertion of t a . table 7-7. data bus lane assignments data bus signals byte lane dh[0?] 0 dh[8?5] 1 dh[16?3] 2 dh[24?1] 3 dl[0?] 4 dl[8?5] 5 dl[16?3] 6 dl[24?1] 7
7-20 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.7.1.2 data bus (dh[0?1], dl[0?1])?nput following are the state meaning and timing comments for the dh and dl input signals. state meaning asserted/negated?epresents the state of data during a data read transaction. timing comments assertion/negation?ata must be valid on the same bus clock cycle that t a is asserted. 7.2.7.2 data bus parity (dp[0?]) the eight data bus parity (dp[0?]) signals on the 603e are both output and input signals. 7.2.7.2.1 data bus parity (dp[0?])?utput following are the state meaning and timing comments for the dp output signals. state meaning asserted/negated?epresents odd parity for each of 8 bytes of data write transactions. odd parity means that an odd number of bits, including the parity bit, are driven high. the signal assignments are listed in table 7-8. timing comments assertion/negation?he same as dl[0?1]. high impedance?he same as dl[0?1]. 7.2.7.2.2 data bus parity (dp[0?])?nput following are the state meaning and timing comments for the dp input signals. state meaning asserted/negated?epresents odd parity for each byte of read data. parity is checked on all data byte lanes, regardless of the size of the transfer. detected even parity causes a checkstop if data parity errors are enabled in the hid0 register. (see dpe .) timing comments assertion/negation?he same as dl[0?1]. table 7-8. dp[0?] signal assignments signal name signal assignments dp0 dh[0?] dp1 dh[8?5] dp2 dh[16?3] dp3 dh[24?1] dp4 dl[0?] dp5 dl[8?5] dp6 dl[16?3] dp7 dl[24?1]
motorola chapter 7. signal descriptions 7-21 7.2.7.3 data parity error (dpe )?utput the data parity error (dpe ) signal is an output signal (output-only) on the 603e. note that the (dpe ) signal is an open-drain type output, and requires an external pull-up resistor (for example, 10 k w to vdd) to assure proper de-assertion of the (dpe ) signal. following are the state meaning and timing comments for the dpe signal. state meaning asserted?ndicates incorrect data bus parity. negated?ndicates correct data bus parity. timing comments assertion?ccurs on the second bus clock cycle after t a is asserted to the 603e, unless t a is cancelled by an assertion of dr tr y . high impedance?ccurs on the third bus clock cycle after t a is asserted to the 603e. 7.2.7.4 data bus disable ( dbdis )?nput the data bus disable (dbdis ) signal is an input signal (input-only) on the 603e. following are the state meanings and timing comments for the dbdis signal. state meaning asserted?ndicates (for a write transaction) that the 603e must release data bus and the data bus parity to high impedance during the following cycle. the data tenure will remain active, dbb will remain driven, and the transfer termination signals will still be monitored by the 603e. negated?ndicates the data bus should remain normally driven. dbdis is ignored during read transactions. timing comments assertion/negation?ay be asserted on any clock cycle when the 603e is driving, or will be driving the data bus; may remain asserted multiple cycles. 7.2.8 data transfer termination signals data termination signals are required after each data beat in a data transfer. note that in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the ?al data beat. for a detailed description of how these signals interact, see section 8.4.4, ?ata transfer termination.
7-22 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.8.1 transfer acknowledge (t a )?nput the transfer acknowledge (t a ) signal is an input signal (input-only) on the 603e. following are the state meaning and timing comments for the t a signal. state meaning asserted?ndicates that a single-beat data transfer completed successfully or that a data beat in a burst transfer completed successfully (unless dr tr y is asserted on the next bus clock cycle). note that ta must be asserted for each data beat in a burst transaction, and must be asserted during assertion of dr tr y . for more information, see section 8.4.4, ?ata transfer termination. negated?during dbb ) indicates that, until t a is asserted, the 603e must continue to drive the data for the current write or must wait to sample the data for reads. timing comments assertion?ust not occur before aa ck for the current transaction (if the address retry mechanism is to be used to prevent invalid data from being used by the processor); otherwise, assertion may occur at any time during the assertion of db b . the system can withhold assertion of t a to indicate that the 603e should insert wait states to extend the duration of the data beat. negation?ust occur after the bus clock cycle of the ?al (or only) data beat of the transfer. for a burst transfer, the system can assert t a for one bus clock cycle and then negate it to advance the burst transfer to the next beat and insert wait states during the next beat. (note: when the 603e is con?ured for 1:1 clock mode and is performing a burst read into the data cache, the 603e requires one wait state between the assertion of t s and the ?st assertion of t a for that transaction. if no-drtry mode is also selected, the 603e requires two wait states for 1:1 clock mode, or 1 wait state for 1.5:1 clock mode.) 7.2.8.2 data retry (dr tr y )?nput the data retry (dr tr y ) signal is input only on the 603e. following are the state meaning and timing comments for the dr tr y signal. state meaning asserted?ndicates that the 603e must invalidate the data from the previous read operation. negated?ndicates that data presented with t a on the previous read operation is valid. note that dr tr y is ignored for write transactions. timing comments assertion?ust occur during the bus clock cycle immediately after t a is asserted if a retry is required. the dr tr y signal may be held asserted for multiple bus clock cycles. when dr tr y is negated, data must have been valid on the previous clock with t a asserted.
motorola chapter 7. signal descriptions 7-23 negation?ust occur during the bus clock cycle after a valid data beat. this may occur several cycles after dbb is negated, effectively extending the data bus tenure. start-up?he dr tr y signal is sampled at the negation of hreset ; if dr tr y is asserted, no-drtry mode is selected. if dr tr y is negated at start-up, dr tr y is enabled. 7.2.8.3 transfer error acknowledge (tea )input the transfer error acknowledge (tea ) signal is input only on the 603e. following are the state meaning and timing comments for the tea signal. state meaning asserted indicates that a bus error occurred. causes a machine check exception (and possibly causes the processor to enter checkstop state if machine check enable bit is cleared (msr[me] = 0)). for more information, see section 4.5.2.2, ?heckstop state (msr[me] = 0).?assertion terminates the current transaction; that is, assertion of t a and dr tr y are ignored. the assertion of tea causes the negation/high impedance of dbb in the next clock cycle. however, data entering the gpr or the cache are not invalidated. (note that the term, ?xception,?is also referred to as ?nterrupt?in the architecture specification.) negated?ndicates that no bus error was detected . timing comments assertion?ay be asserted while dbb is asserted, and the cycle after t a during a read operation. tea should be asserted for one cycle only. negation? ea must be negated no later than the negation of dbb . 7.2.9 system status signals most system status signals are input signals that indicate when exceptions are received, when checkstop conditions have occurred, and when the 603e must be reset. the 603e generates the output signal, ckstp_out , when it detects a checkstop condition. for a detailed description of these signals, see section 8.7, ?nterrupt, checkstop, and reset signals. 7.2.9.1 interrupt (int )?nput the interrupt (int ) signal is input only. following are the state meaning and timing comments for the int signal. state meaning asserted?he 603e initiates an interrupt if msr[ee] is set; otherwise, the 603e ignores the interrupt. to guarantee that the 603e will take the external interrupt, the in t signal must be held active until the 603e takes the interrupt; otherwise, whether the 603e takes an external interrupt, depends on whether the msr[ee] bit was set while the i nt signal was held active.
7-24 mpc603e & EC603E risc microprocessors user's manual motorola negated?ndicates that normal operation should proceed. see section 8.7.1, ?xternal interrupts. timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. the int input is level-sensitive. negation?hould not occur until interrupt is taken. 7.2.9.2 system management interrupt (smi )?nput the system management interrupt (smi ) signal is input only. following are the state meaning and timing comments for the smi signal. state meaning asserted?he 603e initiates a system management interrupt operation if the msr[ee] is set; otherwise, the 603e ignores the exception condition. the 603e must hold the smi signal active until the exception is taken. negated?ndicates that normal operation should proceed. see section 8.7.1, ?xternal interrupts. timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. the smi input is level-sensitive. . negation?hould not occur until interrupt is taken. 7.2.9.3 machine check interrupt (mcp )?nput the machine check interrupt (mcp ) signal is input only on the 603e. following are the state meaning and timing comments for the mcp signal. state meaning asserted?he 603e initiates a machine check interrupt operation if msr[me] and hid0[emcp] are set; if msr[me] is cleared and hid0[emcp] is set, the 603e must terminate operation by internally gating off all clocks, and releasing all outputs (except c kstp_out ) to the high impedance state. if hid0[emcp] is cleared, the 603e ignores the interrupt condition. the mcp signal must be held asserted for 2 bus clock cycles. negated?ndicates that normal operation should proceed. see section 8.7.1, ?xternal interrupts. timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. the mcp input is negative edge- sensitive. negation?ay be negated 2 bus cycles after assertion. 7.2.9.4 checkstop input ( ckstp _in )?nput the checkstop input (ckstp _in ) signal is input only on the 603e. following are the state meaning and timing comments for the ckstp _in signal. state meaning asserted?ndicates that the 603e must terminate operation by internally gating off all clocks, and release all outputs (except
motorola chapter 7. signal descriptions 7-25 c kstp_out ) to the high impedance state. once ckstp _in has been asserted it must remain asserted until the system has been reset. negated?ndicates that normal operation should proceed. see section 8.7.2, ?heckstops. timing comments assertion?ay occur at any time and may be asserted asynchronously to the input clocks. negation?ay occur any time after the c kstp_ out output signal has been asserted. 7.2.9.5 checkstop output (c kstp_out ) ?utput the checkstop output (ckstp_out ) signal is output only on the 603e. note that the c kstp_out signal is an open-drain type output, and requires an external pull-up resistor (for example, 10 k w to vdd) to assure proper de-assertion of the ckstp_out signal. following are the state meaning and timing comments for the ckstp_out signal. state meaning asserted?ndicates that the 603e has detected a checkstop condition and has ceased operation. negated?ndicates that the 603e is operating normally. see section 8.7.2, ?heckstops. timing comments assertion?ay occur at any time and may be asserted asynchronously to the 603e input clocks. negation?s negated upon assertion of hreset . 7.2.9.6 reset signals there are two reset signals on the 603e?ard reset (hreset ) and soft reset (sreset ). descriptions of the reset signals are as follows: 7.2.9.6.1 hard reset ( hreset )?nput the hard reset (hreset ) signal is input only and must be used at power-on to properly reset the processor. following are the state meaning and timing comments for the hreset signal. state meaning asserted?nitiates a complete hard reset operation when this input transitions from asserted to negated. causes a reset exception as described in section 4.5.1.1, ?ard reset and power-on reset.? output drivers are released to high impedance within ?e clocks after the assertion of hreset . negated?ndicates that normal operation should proceed. see section 8.7.3, ?eset inputs.
7-26 mpc603e & EC603E risc microprocessors user's manual motorola timing comments assertion?ay occur at any time and may be asserted asynchronously to the 603e input clock; must be held asserted for a minimum of 255 clock cycles after the pll lock time has been met. refer to the appropriate hardware speci?ations for further timing comments. negation?ay occur any time after the minimum reset pulse width has been met. this input has additional functionality in certain test modes. 7.2.9.6.2 soft reset ( sreset )?nput the soft reset (sreset ) signal is input only. following are the state meaning and timing comments for the sreset signal. state meaning asserted?initiates processing for a reset exception as described in section 4.5.1.2, ?oft reset. negated?ndicates that normal operation should proceed. see section 8.7.3, ?eset inputs. timing comments assertion?ay occur at any time and may be asserted asynchronously to the 603e input clock. the sreset input is negative edge-sensitive. negation?ay be negated 2 bus cycles after assertion. this input has additional functionality in certain test modes. 7.2.9.7 processor status signals processor status signals indicate the state of the processor. this includes the memory reservation signal, machine quiesce control signals, time base enable signal, and tlbisync signal. 7.2.9.7.1 quiescent request ( qreq ) the quiescent request (qreq ) signal is output only. following are the state meaning and timing comments for the qreq signal. state meaning asserted?ndicates that the 603e is requesting all bus activity normally required to be snooped to terminate or to pause so the 603e may enter a quiescent (low power) state. once the 603e has entered a quiescent state, it no longer snoops bus activity. negated?ndicates that the 603e is not making a request to enter the quiescent state. timing comments assertion/negation?ay occur on any cycle. qreq will remain asserted for the duration of the quiescent state. 7.2.9.7.2 quiescent acknowledge (qa ck ) the quiescent acknowledge (qa ck ) signal is input only. following are the state meaning and timing comments for the qa ck signal.
motorola chapter 7. signal descriptions 7-27 state meaning asserted?ndicates that all bus activity that requires snooping has terminated or paused, and that the 603e may enter the quiescent (or low power) state. negated?ndicates that the 603e may not enter a quiescent state, and must continue snooping the bus. timing comments assertion/negation?ay occur on any cycle following the assertion of qreq , and must be held asserted for a minimum of one bus clock cycle. start-up?a ck is sampled at the negation of hreset to select reduced-pinout mode; if qa ck is asserted at start-up, reduced- pinout mode is disabled. 7.2.9.7.3 reservation (rsr v )?utput the reservation (rsr v ) signal is output only on the 603e. following are the state meaning and timing comments for the rsr v signal. state meaning asserted/negated?epresents the state of the reservation coherency bit in the reservation address register that is used by the lwarx and stwcx. instructions. see section 8.8.1, ?upport for the lwarx/stwcx. instruction pair. timing comments assertion/negation?ccurs synchronously with respect to bus clock cycles. the execution of an lwarx instruction sets the internal reservation condition. 7.2.9.7.4 time base enable (tben)?nput the time base enable (tben) signal is input only on the 603e. following are the state meanings and timing comments for the tben signal. state meaning asserted?ndicates that the time base should continue clocking. this input is essentially a ?ount enable?control for the time base counter. negated?ndicates the time base should stop clocking. timing comments assertion/negation?ay occur on any cycle. 7.2.9.7.5 tlbi sync ( tlbisync ) the tlbi sync (tlbisync ) signal is input only on the 603e. following are the state meanings and timing comments for the tlbisync signal. state meaning asserted?ndicates that instruction execution should stop after execution of a tlbsync instruction. negated?ndicates that the instruction execution may continue or resume after the completion of a tlbsync instruction.
7-28 mpc603e & EC603E risc microprocessors user's manual motorola timing comments assertion/negation?ay occur on any cycle. start-up?lbisync is sampled at the negation of hreset to select 32-bit data bus mode; if tlbisync is negated at start-up, 32- bit mode is disabled and the default 64-bit mode is selected. 7.2.10 cop/scan interface the 603e has extensive on-chip test capability including the following: built-in instruction and data cache self test (bist) debug control/observation (cop) boundary scan (ieee 1149.1 compliant interface) lssd test control the bist hardware is not exercised as part of the power-on reset (por) sequence. the cop and boundary scan logic are not used under typical operating conditions. detailed discussion of the 603e test functions is beyond the scope of this document; however, suf?ient information has been provided to allow the system designer to disable the test functions that would impede normal operation. the cop/scan interface is shown in figure 7-2. for more information, see section 8.9, ?eee 1149.1-compliant interface.? figure 7-2. ieee 1149.1-compliant boundary scan interface 7.2.11 pipeline tracking support the 603e provides for nonintrusive instruction pipeline tracking. setting the hid0[eice] bit causes the address parity and data parity signals to be rede?ed as outputs providing pipeline tracking information. these signals toggle at the cpu clock rate and will have special loading and timing requirements when in this mode. tdi (test data input) tms (test mode select) tck (test clock input) tdo (test data output) trs t (test reset)
motorola chapter 7. signal descriptions 7-29 table 7-9 shows the outputs when hid0[eice] is set. given the object code, these signals provide suf?ient information to track instruction execution (except for register indirect branches). register indirect branches may be tracked either by examining and matching potential target streams (nonintrusive but not always resolvable), or by forcing register indirect branch targets to be fetched externally by setting hid0[fbiob]. setting hid0[eice] also enables the processor clock to the clk_out signal which provides a synchronizing clock to the pipeline tracking outputs. 7.2.12 clock signals the clock signal inputs of the 603e determine the system clock frequency and provide a ?xible clocking scheme that allows the processor to operate at an integer multiple of the system clock frequency. refer to the appropriate hardware specifications for exact timing relationships of the clock signals. table 7-9. pipeline tracking outputs bit(s) function encoding dp[0?] fetch 00 none 01 two 10 one 11 branch dp[2?] retire 00 none 01 two 10 one 11 exception dp[4?] fold 00 none 01 first 10 second 11 both dp[6?] prediction 00 nonspec 01 spec_2nd 10 spec_both 11 flush_spec ap[0?] fea fea[20?3]
7-30 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.12.1 system clock (sysclk)?nput the 603e requires a single system clock (sysclk) input. this input sets the frequency of operation for the bus interface. internally, the 603e uses a phase-locked loop (pll) circuit to generate a master clock for all of the cpu circuitry (including the bus interface circuitry) which is phase-locked to the sysclk input. the master clock may be set to an integer or half-integer multiple (1:1, 1.5:1, 2:1, 2.5:1, 3:1, 3.5:1 or 4:1) of the sysclk frequency allowing the cpu core to operate at an equal or greater frequency than the bus interface. state meaning asserted/negated?he sysclk input is the primary clock input for the 603e, and represents the bus clock frequency for 603e bus operation. internally, the 603e may be operating at an integer or half- integer multiple of the bus clock frequency. timing comments duty cycle?efer to the appropriate hardware speci?ations for timing comments. note : sysclk is used as the frequency reference for the internal pll clock generator, and must not be suspended or varied during normal operation to ensure proper pll operation. 7.2.12.2 test clock (clk_out)?utput the test clock (clk_out) signal is an output-only signal on the 603e. following are the state meaning and timing comments for the clk_out signal. state meaning asserted/negated?rovides pll clock output for pll testing and monitoring. the clk_out signal clocks at either the processor clock frequency, the bus clock frequency, or the half-bus clock frequency if enabled by the appropriate bits in the hid0 register; the default state of the clk_out signal is high-impedance. the clk_out signal is provided for testing purposes only. timing comments assertion/negation?efer to the appropriate hardware speci?ations for timing comments. 7.2.12.3 pll con?uration (pll_cfg[0?])?nput the pll (phase-lock loop) is con?ured by the pll_cfg[0?] signals. for a given sysclk (bus) frequency, the pll con?uration signals set the internal cpu frequency of operation. following are the state meaning and timing comments for the pll_cfg[0?] signals. state meaning asserted/negated?con?ures the operation of the pll and the internal processor clock frequency. settings are based on the desired bus and internal frequency of operation. timing comments assertion/negation?ust remain stable during operation; should only be changed during the assertion of hreset or during sleep mode. these bits may be read through bits pc0?c3 in the hid1 register.
motorola chapter 7. signal descriptions 7-31 table 7-10. pll configuration bus, cpu and pll frequencies pll_cfg[0?] cpu/ sysclk ratio bus 16.6 mhz bus 20 mhz bus 25 mhz bus 33.3 mhz bus 40 mhz bus 50 mhz bus 66.6 mhz 0000 1:1 66.6 (133) 0001 1:1 33.3 (133) 40 (160) 50 (200) 0010 1:1 16.6 (133) 20 (160) 25 (200) 1100 1.5:1 75 (150) 100 (200) 0100 2:1 66.6 (133) 80 (160) 100 (200) 0101 2:1 33.3 (133) 40 (160) 50 (200) 0110 2.5:1 83.3 (166) 100 (200) 1000 3:1 75 (150) 100 (200) 1110 3.5:1 70 (140) 87.5 (175) 1010 4:1 66.6 (133) 80 (160) 100 (200) 0011 pll bypass 1111 clock off notes: 1. some pll con?urations may select bus, cpu, or pll frequencies which are not useful, not supported, or not tested for by the 603e. for complete and up-to-date information, refer to the appropriate hardware speci?ations. pll frequencies, shown in parentheses, should not fall below 133 mhz, and should not exceed 200 mhz. 2. in pll-bypass mode, the sysclk input signal clocks the internal processor directly, and the bus is set for 1:1 mode operation. in clock-off mode, no clocking occurs inside the 603e regardless of the sysclk input.
7-32 mpc603e & EC603E risc microprocessors user's manual motorola 7.2.13 power and ground signals the 603e provides the following connections for power and ground: vdd and ovdd?he vdd and ovdd signals provide the connection for the supply voltage. on the 603e, there is no electrical distinction between the vdd and the ovdd signals. avdd?he avdd power signal provides power to the clock generation phase- lock ed loop. see the appropriate hardware speci?ations for information on how to use this signal. gnd and ognd?he gnd and ognd signals provide the connection for grounding the 603e. on the 603e, there is no electrical distinction between the gnd and ognd signals.
motorola chapter 8. system interface operation 8-1 chapter 8 system interface operation 80 80 this chapter describes the powerpc 603e microprocessors bus interface and its operation. it shows how the 603e signals, de?ed in chapter 7, ?ignal descriptions, interact to perform address and data transfers. 8.1 overview the system interface prioritizes requests for bus operations from the instruction and data caches, and performs bus operations per the 603e bus protocol. it includes address register queues, prioritization logic, and bus control unit. the system interface latches snoop addresses for snooping in the data cache and in the address register queues, snoops for direct-store reply operations and for reservations controlled by the load word and reserve indexed ( lwarx ) and store word conditional indexed ( stwcx. ) instructions, and maintains the touch load address for the cache. the interface allows one level of pipelining; that is, with certain restrictions discussed later, there can be two outstanding transactions at any given time. accesses are prioritized with load operations preceding store operations. instructions are automatically fetched from the memory system into the instruction unit where they are dispatched to the execution units at a peak rate of three instructions per clock. conversely, load and store instructions explicitly specify the movement of operands to and from the integer and ?ating-point register ?es and the memory system. (the EC603E microprocessor does not support the ?ating-point register ?es.) when the 603e encounters an instruction or data access, it calculates the logical address (effective address in the architecture speci?ation) and uses the low-order address bits to check for a hit in the on-chip, 16-kbyte instruction and data caches. during cache lookup, the instruction and data memory management units (mmus) use the higher-order address bits to calculate the virtual address, from which they calculate the physical address (real address in the architecture speci?ation). the physical address bits are then compared with the corresponding cache tag bits to determine if a cache hit occurred. if the access misses in the corresponding cache, the physical address is used to access system memory. in addition to the loads, stores, and instruction fetches, the 603e performs software table search operations following tlb misses, cache cast-out operations when least-recently used cache lines are written to memory after a cache miss, and cache-line snoop push-out operations when a modi?d cache line experiences a snoop hit from another bus master.
8-2 mpc603e & EC603E risc microprocessors user's manual motorola figure 8-1 shows the address path from the execution units and instruction fetcher, through the translation logic to the caches and system interface logic. the 603e uses separate address and data buses and a variety of control and status signals for performing reads and writes. the address bus is 32 bits wide and the data bus can be con?ured to be 32 or 64 bits wide. the interface is synchronous?ll 603e inputs are sampled at and all outputs are driven from the rising edge of the bus clock. the bus can run at the full processor-clock frequency or at an integer division of the processor-clock speed. while the 603e operates at 3.3 volts, all the i/o signals are 5.0 volt ttl-compatible. 8.1.1 operation of the instruction and data caches the 603e provides independent instruction and data caches. each cache is a physically- addressed, 16-kbyte cache with four-way set associativity. both caches consist of 128 sets of four cache lines, with eight words in each cache line. because the data cache on the 603e is an on-chip, write-back primary cache, the predominant type of transaction for most applications is burst-read memory operations, followed by burst-write memory operations, direct-store operations, and single-beat (noncacheable or write-through) memory read and write operations. additionally, there can be address-only operations, variants of the burst and single-beat operations (global memory operations that are snooped, and atomic memory operations, for example), and address retry activity (for example, when a snooped read access hits a modi?d line in the cache). since the 603e data cache tags are single ported, simultaneous load or store and snoop accesses cause resource contention. snoop accesses have the highest priority and are given ?st access to the tags, unless the snoop access coincides with a tag write, in which case the snoop is retried and must re-arbitrate for access to the cache. loads or stores that are deferred due to snoop accesses are performed on the clock cycle following the snoop. the 603e supports a three-state coherency protocol that supports the modi?d, exclusive, and invalid (mei) cache states. the protocol is a subset of the mesi (modi?d/exclusive/shared/invalid) four-state protocol and operates coherently in systems that contain four-state caches. with the exception of the dcbz instruction, the 603e does not broadcast cache control instructions. the cache control instructions are intended for the management of the local cache but not for other caches in the system. cache lines in the 603e are loaded in four beats of 64 bits each (or eight beats of 32 bits each when operating in 32-bit bus mode). the burst load is performed as ?ritical double word ?st.?the cache that is being loaded is blocked to internal accesses until the load completes (that is, no hits under misses). the critical double word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays.
motorola chapter 8. system interface operation 8-3 figure 8-1. block diagram branch processing unit 32-/64-bit data bus 32-bit address bus instruction unit integer unit floating- point unit fpr file fp rename registers 16-kbyte d cache tags sequential fetcher ctr cr lr + * / fpscr system register unit + * / processor bus interface d mmu srs dtlb dbat array touch load buffer copyback buffer 64 bit 32 bit dispatch unit 64 bit 64 bit power dissipation control completion unit time base counter/ decrementer clock multiplier jtag/cop interface xer i mmu srs itlb ibat array 16-kbyte i cache tags 64 bit 64 bit 64 bit 64 bit 64 bit gpr file load/store unit + 64 bit gp rename registers instruction queue + * * note that the EC603E microprocessor does not support the ?ating-point unit or the ?ating-point register ?e.
8-4 mpc603e & EC603E risc microprocessors user's manual motorola cache lines are selected for replacement based on an lru (least recently used) algorithm. each time a cache line is accessed, it is tagged as the most recently used line of the set. when a miss occurs, if both lines in the set are marked as valid, the least recently used line is replaced with the new data. when data to be replaced is in the modi?d state, the modi?d data is written into a write-back buffer while the missed data is being read from memory. when the load completes, the 603e then pushes the replaced line from the write- back buffer to main memory in a burst write operation. 8.1.2 operation of the system interface memory accesses can occur in single-beat (1? bytes) and four-beat (32 bytes) burst data transfers when the 603e is con?ured with a 64-bit data bus. when the 603e is in the optional 32-bit data bus mode, memory accesses can occur in single-beat (1 to 4 bytes), two-beat (8 bytes), and eight-beat (32 bytes) bursts. the address and data buses are independent for memory accesses to support pipelining and split transactions. the 603e can pipeline as many as two transactions and has limited support for out-of-order split-bus transactions. access to the system interface is granted through an external arbitration mechanism that allows devices to compete for bus mastership. this arbitration mechanism is ?xible, allowing the 603e to be integrated into systems that implement various fairness and bus- parking procedures to avoid arbitration overhead. typically, memory accesses are weakly ordered?equences of operations, including load/store string and multiple instructions, do not necessarily complete in the order they begin?aximizing the ef?iency of the bus without sacri?ing coherency of the data. the 603e allows load operations to precede store operations (except when a dependency exists). in addition, the 603e can be con?ured to reorder high-priority store operations ahead of lower-priority store operations. because the processor can dynamically optimize run-time ordering of load/store traf?, overall performance is improved. note that the synchronize ( sync ) instruction can be used to enforce strong ordering. the following sections describe how the 603e interface operates, providing detailed timing diagrams that illustrate how the signals interact. a collection of more general timing diagrams are included as examples of typical bus operations. figure 8-2 is a legend of the conventions used in the timing diagrams. this is a synchronous interface?ll 603e input signals are sampled and output signals are driven on the rising edge of the bus clock cycle (see the powerpc 603e risc microprocessor hardware speci?ations for exact timing information).
motorola chapter 8. system interface operation 8-5 figure 8-2. timing diagram legend 8.1.2.1 optional 32-bit data bus mode the 603e supports an optional 32-bit data bus mode. the 32-bit data bus mode operates the same as the 64-bit data bus mode with the exception of the byte lanes involved in the transfer and the number of data beats that are performed. the number of data beats required for a data tenure in the 32-bit data bus mode is one, two, or eight beats depending on the size of the program transaction and the cache mode for the address. for additional information about 32-bit data bus mode, see section 8.6.1, ?2-bit data bus mode. 603e input (while 603e is a bus master) 603e output (while 603e is a bus master) 603e output (grouped: here, address plus attributes) 603e internal signal (inaccessible to the user, but used in diagrams to clarify operations) compelling dependency?vent will occur on the next clock cycle prerequisite dependency?vent will occur on an undetermined subsequent clock cycle 603e three-state output or input 603e nonsampled input signal with sample point a sampled condition (dot on high or low state) with multiple dependencies timing for a signal had it been asserted (it is not actually asserted) bar over signal name indicates active low ap0 b r a d d r + qual bg
8-6 mpc603e & EC603E risc microprocessors user's manual motorola 8.1.3 direct-store accesses the 603e does not support the extended transfer protocol for accesses to the direct-store storage space. the transfer protocol used for any given access is selected by the t bit in the mmu segment registers; if the t bit is set, the memory access is a direct-store access. an attempt to access to a direct-store segment will result in the 603e taking a dsi exception. 8.2 memory access protocol memory accesses are divided into address and data tenures. each tenure has three phases bus arbitration, transfer, and termination. the 603e also supports address-only transactions. note that address and data tenures can overlap, as shown in figure 8-3. figure 8-3 shows that the address and data tenures are distinct from one another and that both consist of three phases?rbitration, transfer, and termination. address and data tenures are independent (indicated in figure 8-3 by the fact that the data tenure begins before the address tenure ends), which allows split-bus transactions to be implemented at the system level in multiprocessor systems. figure 8-3 shows a data transfer that consists of a single-beat transfer of as many as 64 bits. four-beat burst transfers of 32-byte cache lines require data transfer termination signals for each beat of data. figure 8-3. overlapping tenures on the bus for a single-beat transfer arbitration transfer termination address tenure arbitration single-beat transfer termination data tenure independent address and data
motorola chapter 8. system interface operation 8-7 the basic functions of the address and data tenures are as follows: address tenure arbitration: during arbitration, address bus arbitration signals are used to gain mastership of the address bus. transfer: after the 603e is the address bus master, it transfers the address on the address bus. the address signals and the transfer attribute signals control the address transfer. the address parity and address parity error signals ensure the integrity of the address transfer. termination: after the address transfer, the system signals that the address tenure is complete or that it must be repeated. data tenure arbitration: to begin the data tenure, the 603e arbitrates for mastership of the data bus. transfer: after the 603e is the data bus master, it samples the data bus for read operations or drives the data bus for write operations. the data parity and data parity error signals ensure the integrity of the data transfer. termination: data termination signals are required after each data beat in a data transfer. note that in a single-beat transaction, the data termination signals also indicate the end of the tenure, while in burst accesses, the data termination signals apply to individual beats and indicate the end of the tenure only after the ?al data beat. the 603e generates an address-only bus transfer during the execution of the dcbz instruction, which uses only the address bus with no data transfer involved. additionally, the 603es retry capability provides an ef?ient snooping protocol for systems with multiple memory systems (including caches) that must remain coherent. 8.2.1 arbitration signals arbitration for both address and data bus mastership is performed by a central, external arbiter and, minimally, by the arbitration signals shown in section 7.2.1, address bus arbitration signals.?most arbiter implementations require additional signals to coordinate bus master/slave/snooping activities. note that address bus busy (abb ) and data bus busy (dbb ) are bidirectional signals. these signals are inputs unless the 603e has mastership of one or both of the respective buses; they must be connected high through pull-up resistors so that they remain negated when no devices have control of the buses.
8-8 mpc603e & EC603E risc microprocessors user's manual motorola the following list describes the address arbitration signals: br (bus request) ?ssertion indicates that the 603e is requesting mastership of the address bus. bg (bus grant) ?ssertion indicates that the 603e may, with the proper quali?ation, assume mastership of the address bus. a quali?d bus grant occurs when bg is asserted and abb and ar tr y are negated. if the 603e is parked, br need not be asserted for the quali?d bus grant. abb (address bus busy) ?ssertion by the 603e indicates that the 603e is the address bus master. the following list describes the data arbitration signals: dbg (data bus grant )?ndicates that the 603e may, with the proper qualification, assume mastership of the data bus. a quali?d data bus grant occurs when dbg is asserted while dbb , dr tr y , and ar tr y are negated. the dbb signal is driven by the current bus master, dr tr y is only driven from the bus, and ar tr y is from the bus, but only for the address bus tenure associated with the current data bus tenure (that is, not from another address tenure). dbwo (data bus write only )?ssertion indicates that the 603e may perform the data bus tenure for an outstanding write address even if a read address is pipelined before the write address. if dbw o is asserted, the 603e will assume data bus mastership for a pending data bus write operation; the 603e will take the data bus for a pending read operation if this input is asserted along with dbg and no write is pending. care must be taken with dbw o to ensure the desired write is queued (for example, a cache-line snoop push-out operation). ? dbb (data bus busy) ?ssertion by the 603e indicates that the 603e is the data bus master. the 603e always assumes data bus mastership if it needs the data bus and is given a quali?d data bus grant (see dbg ). for more detailed information on the arbitration signals, refer to section 7.2.1, address bus arbitration signals,?and section 7.2.6, ?ata bus arbitration signals. 8.2.2 address pipelining and split-bus transactions the 603e protocol provides independent address and data bus capability to support pipelined and split-bus transaction system organizations. address pipelining allows the address tenure of a new bus transaction to begin before the data tenure of the current transaction has ?ished. split-bus transaction capability allows other bus activity to occur (either from the same master or from different masters) between the address and data tenures of a transaction. while this capability does not inherently reduce memory latency, support for address pipelining and split-bus transactions can greatly improve effective bus/memory throughput. for this reason, these techniques are most effective in shared-memory multiprocessor
motorola chapter 8. system interface operation 8-9 implementations where bus bandwidth is an important measurement of system performance. external arbitration is required in systems in which multiple devices must compete for the system bus. the design of the external arbiter affects pipelining by regulating address bus grant (bg ), data bus grant (dbg ), and address acknowledge (aack ) signals. for example, a one-level pipeline is enabled by asserting aack to the current address bus master and granting mastership of the address bus to the next requesting master before the current data bus tenure has completed. two address tenures can occur before the current data bus tenure completes. the 603e can pipeline its own transactions to a depth of one level (intraprocessor pipelining); however, the 603e bus protocol does not constrain the maximum number of levels of pipelining that can occur on the bus between multiple masters (interprocessor pipelining). the external arbiter must control the pipeline depth and synchronization between masters and slaves. in a pipelined implementation, data bus tenures are kept in strict order with respect to address tenures. however, external hardware can further decouple the address and data buses, allowing the data tenures to occur out of order with respect to the address tenures. this requires some form of system tag to associate the out-of-order data transaction with the proper originating address transaction (not de?ed for the 603e interface). individual bus requests and data bus grants from each processor can be used by the system to implement tags to support interprocessor, out-of-order transactions. the 603e supports a limited intraprocessor out-of-order, split-transaction capability via the data bus write only (dbwo ) signal. for more information about using dbwo , see section 8.10, ?sing data bus write only. 8.3 address bus tenure this section describes the three phases of the address tenure?ddress bus arbitration, address transfer, and address termination. 8.3.1 address bus arbitration when the 603e needs access to the external bus and it is not parked (bg is negated), it asserts bus request (br ) until it is granted mastership of the bus and the bus is available (see figure 8-4). the external arbiter must grant master-elect status to the potential master by asserting the bus grant (bg ) signal. the 603e requesting the bus determines that the bus is available when the abb input is negated. when the address bus is not busy (abb input is negated), bg is asserted and the address retry (artry ) input is negated. this is referred to as a quali?d bus grant. the potential master assumes address bus mastership by asserting abb when it receives a quali?d bus grant.
8-10 mpc603e & EC603E risc microprocessors user's manual motorola figure 8-4. address bus arbitration external arbiters must allow only one device at a time to be the address bus master. in implementations in which no other device can be a master, bg can be grounded (always asserted) to continually grant mastership of the address bus to the 603e. if the 603e asserts br before the external arbiter asserts bg , the 603e is considered to be unparked, as shown in figure 8-4. figure 8-5 shows the parked case, where a quali?d bus grant exists on the clock edge following a need_bus condition. notice that the bus clock cycle required for arbitration is eliminated if the 603e is parked, reducing overall memory latency for a transaction. the 603e always negates abb for at least one bus clock cycle after aack is asserted, even if it is parked and has another transaction pending. typically, bus parking is provided to the device that was the most recent bus master; however, system designers may choose other schemes such as providing unrequested bus grants in situations where it is easy to correctly predict the next device requesting bus mastership. -1 0 1 n e e d _ b u s b r bg abb artry q u a l b g a b b logical bus clock
motorola chapter 8. system interface operation 8-11 figure 8-5. address bus arbitration showing bus parking when the 603e receives a quali?d bus grant, it assumes address bus mastership by asserting abb and negating the br output signal. meanwhile, the 603e drives the address for the requested access onto the address bus and asserts ts to indicate the start of a new transaction. when designing external bus arbitration logic, note that the 603e may assert br without using the bus after it receives the quali?d bus grant. for example, in a system using bus snooping, if the 603e asserts br to perform a replacement copy-back operation, another device can invalidate that line before the 603e is granted mastership of the bus. once the 603e is granted the bus, it no longer needs to perform the copy-back operation; therefore, the 603e does not assert abb and does not use the bus for the copy-back operation. note that the 603e asserts br for at least one clock cycle in these instances. 8.3.2 address transfer during the address transfer, the physical address and all attributes of the transaction are transferred from the bus master to the slave device(s). snooping logic may monitor the transfer to enforce cache coherency; see discussion about snooping in section 8.3.3, address transfer termination. -1 0 1 n e e d _ b u s b r bg abb artry q u a l b g a b b
8-12 mpc603e & EC603E risc microprocessors user's manual motorola the signals used in the address transfer include the following signal groups: address transfer start signal: transfer start (t s ) address transfer signals: address bus (a[0?1]), address parity (ap[0?]), and address parity error (ape ) address transfer attribute signals: transfer type (tt[0?]), transfer code (tc[0?]), transfer size (tsiz[0?]), transfer burst (tbst ), cache inhibit (ci ), write-through (wt ), global (gbl ), and cache set element (cse[0?]) figure 8-6 shows that the timing for all of these signals, except ts and ape , is identical. all of the address transfer and address transfer attribute signals are combined into the addr+ grouping in figure 8-6. the ts signal indicates that the 603e has begun an address transfer and that the address and transfer attributes are valid (within the context of a synchronous bus). the 603e always asserts ts coincident with ab b . as an input, ts need not coincide with the assertion of abb on the bus (that is, t s can be asserted with, or on, a subsequent clock cycle after abb is asserted; the 603e tracks this transaction correctly). figure 8-6. address bus transfer in figure 8-6, the address transfer occurs during bus clock cycles 1 and 2 (arbitration occurs in bus clock cycle 0 and the address transfer is terminated in bus clock 3). in this diagram, the address bus termination input, aack , is asserted to the 603e on the bus clock following assertion of ts (as shown by the dependency line). this is the minimum duration of the address transfer for the 603e; the duration can be extended by delaying the assertion of aack for one or more bus clocks. 01234 q u a l b g t s a b b a d d r + aack artry_in
motorola chapter 8. system interface operation 8-13 8.3.2.1 address bus parity the 603e always generates 1 bit of correct odd-byte parity for each of the 4 bytes of address when a valid address is on the bus. the calculated values are placed on the ap[0?] outputs when the 603e is the address bus master. if the 603e is not the master and ts and gbl are asserted together (quali?d condition for snooping memory operations), the calculated values are compared with the ap[0?] inputs. if there is an error, and address parity checking is enabled (hid0[eba] set to 1), the ape output is asserted. an address bus parity error causes a checkstop condition if msr[me] is cleared to 0. for more information about checkstop conditions, see chapter 4, ?xceptions.? 8.3.2.2 address transfer attribute signals the transfer attribute signals include several encoded signals such as the transfer type (tt[0?]) signals, transfer burst (tbst ) signal, transfer size (tsiz[0?]) signals, and transfer code (tc[0?]) signals. section 7.2.4, address transfer attribute signals, describes the encodings for the address transfer attribute signals. 8.3.2.2.1 transfer type (tt[0?]) signals snooping logic should fully decode the transfer type signals if the gbl signal is asserted. slave devices can sometimes use the individual transfer type signals without fully decoding the group. for a complete description of the encoding for transfer type signals tt[0?], refer to table 8-1 and table 8-2. 8.3.2.2.2 transfer size (tsiz[0?]) signals the transfer size signals (tsiz[0?]) indicate the size of the requested data transfer as shown in table 8-1. the tsiz[0?] signals may be used along with tbst and a[29?1] to determine which portion of the data bus contains valid data for a write transaction or which portion of the bus should contain valid data for a read transaction. note that for a burst transaction (as indicated by the assertion of tbst ), tsiz[0?] are always set to 0b010. therefore, if the tbst signal is asserted, the memory system should transfer a total of eight words (32 bytes), regardless of the tsiz[0?] encoding.
8-14 mpc603e & EC603E risc microprocessors user's manual motorola the basic coherency size of the bus is de?ed to be 32 bytes (corresponding to one cache line). data transfers that cross an aligned, 32-byte boundary either must present a new address onto the bus at that boundary (for coherency consideration) or must operate as noncoherent data with respect to the 603e. the 603e never generates a bus transaction with a transfer size of 5 bytes, 6 bytes, or 7 bytes. 8.3.2.3 burst ordering during data transfers during burst data transfer operations, 32 bytes of data (one cache line) are transferred to or from the cache in order. burst write transfers are always performed zero double word ?st, but since burst reads are performed critical double word ?st, a burst read transfer may not start with the ?st double word of the cache line, and the cache line ?l may wrap around the end of the cache line. this section describes the burst ordering for the 603e when operating in either the 64- or 32-bit bus mode. table 8-2 describes the burst ordering when the 603e is con?ured with a 64-bit data bus. table 8-1. transfer size signal encodings tbst tsiz0 tsiz1 tsiz2 transfer size asserted 0 1 0 eight-word burst negated 0 0 0 eight bytes negated 0 0 1 one byte negated 0 1 0 two bytes negated 0 1 1 three bytes negated 1 0 0 four bytes negated 1 0 1 five bytes (n/a) negated 1 1 0 six bytes (n/a) negated 1 1 1 seven bytes (n/a) table 8-2. burst ordering?4-bit bus data transfer for starting address: a[27?8] = 00 a[27?8] = 01 a[27?8] = 10 a[27?8] = 11 first data beat dw0 dw1 dw2 dw3 second data beat dw1 dw2 dw3 dw0 third data beat dw2 dw3 dw0 dw1 fourth data beat dw3 dw0 dw1 dw2 note: a[29?1] are always 0b000 for burst transfers by the 603e.
motorola chapter 8. system interface operation 8-15 table 8-3 describes the burst ordering when the 603e is con?ured with a 32-bit bus. 8.3.2.4 effect of alignment in data transfers (64-bit bus) table 8-4 lists the aligned transfers that can occur on the 603e bus when con?ured with a 64-bit width. these are transfers in which the data is aligned to an address that is an integer multiple of the size of the data. for example, table 8-4 shows that 1-byte data is always aligned; however, for a 4-byte word to be aligned, it must be oriented on an address that is a multiple of 4. table 8-3. burst ordering?2-bit bus data transfer for starting address: a[27?8] = 00 a[27?8] = 01 a[27?8] = 10 a[27?8] = 11 first data beat dw0-u dw1-u dw2-u dw3-u second data beat dw0-l dw1-l dw2-l dw3-l third data beat dw1-u dw2-u dw3-u dw0-u fourth data beat dw1-l dw2-l dw3-l dw0-l fifth data beat dw2-u dw3-u dw0-u dw1-u sixth data beat dw2-l dw3-l dw0-l dw1-l seventh data beat dw3-u dw0-u dw1-u dw2-u eighth data beat dw3-l dw0-l dw1-l dw2-l notes: a[29?1] are always 0b000 for burst transfers by the 603e. ? and ? represent the upper and lower word of the double word respectively. table 8-4. aligned data transfers (64-bit bus) transfer size tsiz0 tsiz1 tsiz2 a[29?1] data bus byte lane(s) 01234567 byte 0 0 1 000 ? 0 0 1 001 ? 0 0 1 010 ? 0 0 1 011 ? 0 0 1 100 ? 0 0 1 101 ? 0 0 1 110 ? 0 0 1 111 ?
8-16 mpc603e & EC603E risc microprocessors user's manual motorola the 603e supports misaligned memory operations, although their use may substantially degrade performance. misaligned memory transfers address memory that is not aligned to the size of the data being transferred (such as, a word read of an odd byte address). although most of these operations hit in the primary cache (or generate burst memory operations if they miss), the 603e interface supports misaligned transfers within a word (32-bit aligned) boundary, as shown in table 8-5. note that the 4-byte transfer in table 8-5 is only one example of misalignment. as long as the attempted transfer does not cross a word boundary, the 603e can transfer the data on the misaligned address (for example, a half- word read from an odd byte-aligned address). an attempt to address data that crosses a word boundary requires two bus transfers to access the data. note that an attempt to load or store a ?ating-point operand that is not word-aligned will result in a ?ating-point alignment exception. for more information, refer to section 4.5.6, alignment exception (0x00600). due to the performance degradations associated with misaligned memory operations, they are best avoided. in addition to the double-word straddle boundary condition, the address translation logic can generate substantial exception overhead when the load/store multiple and load/store string instructions access misaligned data. it is strongly recommended that software attempt to align code and data where possible. half word 0 1 0 000 ?? 0 1 0 010 ?? 0 1 0 100 ?? 0 1 0 110 ?? word 1 0 0 000 ???? 1 0 0 100 ???? double word 0 0 0 000 ???????? notes: these entries indicate the byte portions of the requested operand that are read or written during that bus transaction. these entries are not required and are ignored during read transactions and are driven with unde- ?ed data during all write transactions. table 8-4. aligned data transfers (64-bit bus) (continued) transfer size tsiz0 tsiz1 tsiz2 a[29?1] data bus byte lane(s) 01234567
motorola chapter 8. system interface operation 8-17 8.3.2.5 effect of alignment in data transfers (32-bit bus) the aligned data transfer cases for 32-bit data bus mode are shown in table 8-6. all of the transfers require a single data beat (if caching-inhibited or write-through) except for double-word cases which require two data beats. the double-word case is only generated by the 603e for load or store double operations to/from the ?ating-point gprs (not supported on the EC603E microprocessor). all caching-inhibited instruction fetches are performed as word operations. table 8-5. misaligned data transfers (four-byte examples) transfer size (four bytes) tsiz[0?] a[29?1] data bus byte lanes 01234567 aligned 1 0 0 0 0 0 a a a a misalignedrst access second access 0 1 1 0 0 1 a a a 0 0 1 1 0 0 a misalignedrst access second access 0 1 0 0 1 0 a a 0 1 1 1 0 0 a a misalignedrst access second access 0 0 1 0 1 1 a 0 1 1 1 0 0 a a a aligned 1 0 0 1 0 0 a a a a misalignedrst access second access 0 1 1 1 0 1 a a a 0 0 1 0 0 0 a misalignedrst access second access 0 1 0 1 1 0 a a 0 1 0 0 0 0 a a misalignedrst access second access 0 0 1 1 1 1 a 0 1 1 0 0 0 a a a notes: a: byte lane used : byte lane not used
8-18 mpc603e & EC603E risc microprocessors user's manual motorola misaligned data transfers when the 603e is con?ured with a 32-bit data bus operate in the same way as when con?ured with a 64-bit data bus, with the exception that only the dh[0?1] data bus is used. see table 8-7 for an example of a 4-byte misaligned transfer starting at each possible byte address within a double word. table 8-6. aligned data transfers (32-bit bus mode) transfer size tsiz0 tsiz1 tsiz2 a[29?1] data bus byte lane(s) 01234567 byte 0 0 1 000 a xxxx 0 0 1 001 axxxxx 0 0 1 010 axxxx 0 0 1 011 axxxx 0 0 1 100 axxxx 0 0 1 101 axxxx 0 0 1 110 axxxx 0 0 1 111 axxxx half word 0 1 0 000 a a xxxx 0 1 0 010 aaxxxx 0 1 0 100 aaxxxx 0 1 0 110 aaxxxx word 1 0 0 000 aaaa xxxx 1 0 0 100 aaaa xxxx double word second beat 0 0 0 000 aaaa xxxx 0 0 0 000 aaaa xxxx notes: a: byte lane used ? byte lane not used x: byte lane not used in 32-bit bus mode
motorola chapter 8. system interface operation 8-19 8.3.2.5.1 alignment of external control instructions the size of the data transfer associated with the eciwx and ecowx instructions is always 4 bytes. however, if the eciwx or ecowx instruction is misaligned and crosses any word boundary, the 603e will generate two bus operations, each with a size of fewer than 4 bytes. for the ?st bus operation, bits a[29?1] equal bits 29?1 of the effective address of the instruction, which will be 0b101, 0b110, or 0b111. the size associated with the ?st bus operation will be 3, 2, or 1 bytes, respectively. for the second bus operation, bits a[29?1] equal 0b000, and the size associated with the operation will be 1, 2, or 3 bytes, respectively. for both operations, tbst and tsiz[0?] are rede?ed to specify the resource id (rid). the resource id is copied from bits 28?1 of the ear. for eciwx / ecowx operations, the state of bit 28 of the ear is presented by the tbst signal without inversion (if ear[28] = 1, tbst = 1). the size of the second bus operation cannot be deduced from the operation itself; the system must determine how many bytes were transferred on the ?st bus operation to determine the size of the second operation. table 8-7. misaligned 32-bit data bus transfer (four-byte examples) transfer size (four bytes) tsiz[0?] a[29?1] data bus byte lanes 01234567 aligned 1 0 0 0 0 0 a a a a x x x x misalignedrst access second access 0 1 1 0 0 1 a a a x x x x 0 0 1 1 0 0 a x x x x misalignedrst access second access 0 1 0 0 1 0 a a x x x x 0 1 0 1 0 0 a a x x x x x misalignedrst access second access 0 0 1 0 1 1 a x x x x 0 1 1 1 0 0 a a a x x x x aligned 1 0 0 1 0 0 a a a a x x x x misalignedrst access second access 0 1 1 1 0 1 a a a x x x x 0 0 1 0 0 0 a x x x x misalignedrst access second access 0 1 0 1 1 0 a a x x x x 0 1 0 0 0 0 a a x x x x misalignedrst access second access 0 0 1 1 1 1 a x x x x 0 1 1 0 0 0 a a a x x x x notes: a: byte lane used : byte lane not used x: byte lane not used in 32-bit bus mode
8-20 mpc603e & EC603E risc microprocessors user's manual motorola furthermore, the two bus operations associated with such a misaligned external control instruction are not atomic. that is, the 603e may initiate other types of memory operations between the two transfers. also, the two bus operations associated with a misaligned ecowx may be interrupted by an eciwx bus operation, and vice versa. the 603e does guarantee that the two operations associated with a misaligned ecowx will not be interrupted by another ecowx operation; and likewise for eciwx . because a misaligned external control address is considered a programming error, the system may choose to assert tea or otherwise cause an exception when a misaligned external control bus operation occurs. (the term exception is referred to interrupt in the architecture specification.) 8.3.2.6 transfer code (tc[0?]) signals the tc0 and tc1 signals provide supplemental information about the corresponding address. note that the tc x signals can be used with the tt[0?] and tbst signals to further de?e the current transaction. table 8-8 shows the encodings of the tc0 and tc1 signals. 8.3.3 address transfer termination the address tenure of a bus operation is terminated when completed with the assertion of aack , or retried with the assertion of ar tr y . the 603e does not terminate the address transfer until the aack (address acknowledge) input is asserted; therefore, the system can extend the address transfer phase by delaying the assertion of aack to the 603e. although aack can be asserted as early as the bus clock cycle following ts (see figure 8-7), which allows a minimum address tenure of two bus cycles when the 603e clock is con?ured for 1:1 (processor clock to bus clock) mode, the artry snoop response cannot be determined in the minimum allowed address tenure period. when in 1:1 or 1.5:1 clock mode, aack must not be asserted until the third clock of the address tenure (one address wait state) to allow the 603e an opportunity to assert artry on the bus. for other clock con?urations (2:1, 2.5:1, 3:1, 3.5:1, and 4:1), the artry snoop response can be determined in the minimum address tenure period, and aack may be asserted as early as the second bus clock of the address tenure. as shown in figure 8-7, these signals are asserted for one bus clock cycle, three-stated for half of the next bus clock cycle, driven high till the following bus cycle, and ?ally three-stated. note that aack must be asserted for only one bus clock cycle. table 8-8. transfer code encoding tc[0?] read write 0 0 data transaction any write 0 1 touch load n/a 1 0 instruction fetch n/a 1 1 (reserved) n/a
motorola chapter 8. system interface operation 8-21 the address transfer can be terminated with the requirement to retry if ar tr y is asserted anytime during the address tenure and through the cycle following aack . the assertion causes the entire transaction (address and data tenure) to be rerun. as a snooping device, the 603e asserts ar tr y for a snooped transaction that hits modi?d data in the data cache that must be written back to memory, or if the snooped transaction could not be serviced. as a bus master, the 603e responds to an assertion of ar tr y by aborting the bus transaction and re-requesting the bus. note that after recognizing an assertion of ar tr y and aborting the transaction in progress, the 603e is not guaranteed to run the same transaction the next time it is granted the bus due to internal reordering of load and store operations. if an address retry is required, the ar tr y response will be asserted by a bus snooping device as early as the second cycle after the assertion of ts (or until the third cycle following ts if 1:1 or 1.5:1 processor to bus clock ratio is selected). once asserted, ar tr y must remain asserted through the cycle after the assertion of aack . the assertion of ar tr y during the cycle after the assertion of aack is referred to as a quali?d ar tr y . an earlier assertion of ar tr y during the address tenure is referred to as an early ar tr y . as a bus master, the 603e recognizes either an early or quali?d ar tr y and prevents the data tenure associated with the retried address tenure. if the data tenure has already begun, the 603e aborts and terminates the data tenure immediately even if the burst data has been received. if the assertion of ar tr y is received up to or on the bus cycle following the ?st (or only) assertion of ta for the data tenure, the 603e ignores the ?st data beat, and if it is a load operation, does not forward data internally to the cache and execution units. if artry is asserted after the first (or only) assertion of ta , improper operation of the bus interface may result. during the clock of a quali?d ar tr y , the 603e also determines if it should negate br and ignore bg on the following cycle. on the following cycle, only the snooping master that asserted ar tr y and needs to perform a snoop copy-back operation is allowed to assert br . this guarantees the snooping master an opportunity to request and be granted the bus before the just-retried master can restart its transaction. note that a nonclocked bus arbiter may detect the assertion of address bus request by the bus master that asserted ar tr y , and return a quali?d bus grant one cycle earlier than shown in figure 8-7.
8-22 mpc603e & EC603E risc microprocessors user's manual motorola figure 8-7. snooped address cycle with artr y 8.4 data bus tenure this section describes the data bus arbitration, transfer, and termination phases de?ed by the 603e memory access protocol. the phases of the data tenure are identical to those of the address tenure, underscoring the symmetry in the control of the two buses. 8.4.1 data bus arbitration data bus arbitration uses the data arbitration signal group?bg , dbwo , and dbb . additionally, the combination of ts and tt[0?] provides information about the data bus request to external logic. the ts signal is an implied data bus request from the 603e; the arbiter must qualify ts with the transfer type (tt) encodings to determine if the current address transfer is an address- only operation, which does not require a data bus transfer (see figure 8-7). if the data bus is needed, the arbiter grants data bus mastership by asserting the dbg input to the 603e. as with the address bus arbitration phase, the 603e must qualify the dbg input with a number of input signals before assuming bus mastership, as shown in figure 8-8. 12 34 5 6 7 ts abb addr aack a r t r y b r qualbg a b b 8
motorola chapter 8. system interface operation 8-23 figure 8-8. data bus arbitration a quali?d data bus grant can be expressed as the following: qdbg = db g asserted while dbb , drtry , and artry (associated with the data bus operation) are negated. when a data tenure overlaps with its associated address tenure, a quali?d artry assertion coincident with a data bus grant signal does not result in data bus mastership (dbb is not asserted). otherwise, the 603e always asserts dbb on the bus clock cycle after recognition of a quali?d data bus grant. since the 603e can pipeline transactions, there may be an outstanding data bus transaction when a new address transaction is retried. in this case, the 603e becomes the data bus master to complete the previous transaction. 8.4.1.1 using the dbb signal the dbb signal should be connected between masters if data tenure scheduling is left to the masters. optionally, the memory system can control data tenure scheduling directly with dbg . however, it is possible to ignore the db b signal in the system if the dbb input is not used as the ?al data bus allocation control between data bus masters, and if the memory system can track the start and end of the data tenure. if dbb is not used to signal the end of a data tenure, db g is only asserted to the next bus master the cycle before the cycle that the next bus master may actually begin its data tenure, rather than asserting it earlier (usually during another masters data tenure) and allowing the negation of dbb to be the ?al gating signal for a quali?d data bus grant. even if dbb is ignored in the system, the 603e always recognizes its own assertion of db b , and requires one cycle after data tenure completion to negate its own dbb before recognizing a quali?d data bus grant for another data tenure. if dbb is ignored in the system, it must still be connected to a pull- up resistor on the 603e to ensure proper operation. 0123 t s dbg dbb drtry q u a l d b g d b b
8-24 mpc603e & EC603E risc microprocessors user's manual motorola 8.4.2 data bus write only as a result of address pipelining, the 603e may have up to two data tenures queued to perform when it receives a quali?d dbg . generally, the data tenures should be performed in strict order (the same order) as their address tenures were performed. the 603e, however, also supports a limited out-of-order capability with the data bus write only (dbwo ) input. when recognized on the clock of a quali?d dbg , dbwo may direct the 603e to perform the next pending data write tenure even if a pending read tenure would have normally been performed ?st. for more information on the operation of dbwo , refer to section 8.10, ?sing data bus write only. if the 603e has any data tenures to perform, it always accepts data bus mastership to perform a data tenure when it recognizes a quali?d dbg . if dbwo is asserted with a quali?d dbg and no write tenure is queued to run, the 603e still takes mastership of the data bus to perform the next pending read data tenure. generally, dbwo should only be used to allow a copy-back operation (burst write) to occur before a pending read operation. if dbwo is used for single-beat write operations, it may negate the effect of the eieio instruction by allowing a write operation to precede a program-scheduled read operation. 8.4.3 data transfer the data transfer signals include dh[0?1], dl[0?1], dp[0?] and dpe . for memory accesses, the dh and dl signals form a 64-bit data path for read and write operations. the 603e transfers data in either single- or four-beat burst transfers when con?ured with a 64-bit data bus; when con?ured with a 32-bit data bus, the 603e performs one-, two-, and eight-beat data transfers. single-beat operations can transfer from 1 to 8 bytes at a time and can be misaligned; see section 8.3.2.4, ?ffect of alignment in data transfers (64-bit bus).?note that the EC603E microprocessor can transfer from 1 to 4 bytes during single- beat operations. burst operations always transfer eight words and are aligned on eight-word address boundaries. burst transfers can achieve signi?antly higher bus throughput than single-beat operations. the type of transaction initiated by the 603e depends on whether the code or data is cacheable and, for store operations whether the cache is considered in write-back or write- through mode, which software controls on either a page or block basis. burst transfers support cacheable operations only; that is, memory structures must be marked as cacheable (and write-back for data store operations) in the respective page or block descriptor to take advantage of burst transfers. the 603e output tbst indicates to the system whether the current transaction is a single- or four-beat transfer (except during eciwx / ecowx transactions, when it signals the state of ear[28]). a burst transfer has an assumed address order. for load or store operations that miss in the cache (and are marked as cacheable and, for stores, write-back in the mmu), the 603e uses the double-word-aligned address associated with the critical code or data that
motorola chapter 8. system interface operation 8-25 initiated the transaction. this minimizes latency by allowing the critical code or data to be forwarded to the processor before the rest of the cache line is ?led. for all other burst operations, however, the cache line is transferred beginning with the oct-word-aligned data. the 603e does not directly support dynamic interfacing to subsystems with less than a 64- bit data path. it does, however, provide a static 32-bit data bus mode; for more information, see section 8.1.2.1, ?ptional 32-bit data bus mode.? 8.4.4 data transfer termination four signals are used to terminate data bus transactions?a , drtry (data retry), tea (transfer error acknowledge), and ar tr y . the ta signal indicates normal termination of data transactions. it must always be asserted on the bus cycle coincident with the data that it is qualifying. it may be withheld by the slave for any number of clocks until valid data is ready to be supplied or accepted. drtry indicates invalid read data in the previous bus clock cycle. drtry extends the current data beat and does not terminate it. if it is asserted after the last (or only) data beat, the 603e negates dbb but still considers the data beat active and waits for another assertion of ta . drtry is ignored on write operations. tea indicates a nonrecoverable bus error event. upon receiving a ?al (or only) termination condition, the 603e always negates dbb for one cycle. if drtry is asserted by the memory system to extend the last (or only) data beat past the negation of dbb , the memory system should three-state the data bus on the clock after the ?al assertion of ta , even though it will negate d rtr y on that clock. this is to prevent a potential momentary data bus con?ct if a write access begins on the following cycle. the tea signal is used to signal a nonrecoverable error during the data transaction. it may be asserted on any cycle during dbb , or on the cycle after a quali?d ta during a read operation, except when no-drtry mode is selected (where no-drtry mode cancels checking the cycle after ta ). the assertion of te a terminates the data tenure immediately even if in the middle of a burst; however, it does not prevent incorrect data that has just been acknowledged with ta from being written into the 603es cache or gprs. the assertion of te a initiates either a machine check exception or a checkstop condition based on the setting of the msr. an assertion of artr y causes the data tenure to be terminated immediately if the artry is for the address tenure associated with the data tenure in operation. if artry is connected for the 603e, the earliest allowable assertion of ta to the 603e is directly dependent on the earliest possible assertion of artry to the 603e; see section 8.3.3, address transfer termination. if the 603e clock is con?ured for 1:1 or 1.5:1 (processor clock to bus clock ratio) mode and the 603e is performing a burst read into its data cache, at least one wait state must be provided between the assertion of ts and the ?st assertion of ta for that transaction. if no-drtry mode is also selected, at least two wait states must be provided. the wait states are required due to possible resource contention in the data cache caused by a block
8-26 mpc603e & EC603E risc microprocessors user's manual motorola replacement (or cast-out) required in connection with the new line?l. these waits states may be provided by withholding the assertion of ta to the 603e for that data tenure, or by withholding dbg to the 603e thereby delaying the start of the data tenure. this restriction applies only to burst reads into the data cache when con?ured in 1:1 or 1.5:1 clock modes. (it does not apply to instruction fetches, write operations, noncachable read operations, or non-1:1 or 1.5:1 clock modes.) 8.4.4.1 normal single-beat termination normal termination of a single-beat data read operation occurs when ta is asserted by a responding slave. the tea and drtry signals must remain negated during the transfer (see figure 8-9). figure 8-9. normal single-beat read termination 01234 t s q u a l d b g d b b data ta drtr y aack
motorola chapter 8. system interface operation 8-27 the drtry signal is not sampled during data writes, as shown in figure 8-10. figure 8-10. normal single-beat write termination normal termination of a burst transfer occurs when ta is asserted for four bus clock cycles, as shown in figure 8-11. the bus clock cycles in which ta is asserted need not be consecutive, thus allowing pacing of the data transfer beats. for read bursts to terminate successfully, tea and drtry must remain negated during the transfer. for write bursts, tea must remain negated for a successful transfer. drtry is ignored during data writes. figure 8-11. normal burst transaction 0123 t s q u a l d b g d b b data ta drtr y aack 12 34 5 6 7 t s q u a l d b g d b b data ta drtry
8-28 mpc603e & EC603E risc microprocessors user's manual motorola for read bursts, drtry may be asserted one bus clock cycle after ta is asserted to signal that the data presented with ta is invalid and that the processor must wait for the negation of drtry before forwarding data to the processor (see figure 8-12). thus, a data beat can be terminated by a predicted branch with ta and then one bus clock cycle later con?med with the negation of drtry . the drtry signal is valid only for read transactions. ta must be asserted on the bus clock cycle before the ?st bus clock cycle of the assertion of drtry ; otherwise the results are undefined. the drtry signal extends data bus mastership such that other processors cannot use the data bus until drtry is negated. therefore, in the example in figure 8-12, dbb cannot be asserted until bus clock cycle 5. this is true for both read and write operations even though drtry does not extend bus mastership for write operations. figure 8-12. termination with drtry figure 8-13 shows the effect of using drtry during a burst read. it also shows the effect of using ta to pace the data transfer rate. notice that in bus clock cycle 3 of figure 8-13, ta is negated for the second data beat. the 603e data pipeline does not proceed until bus clock cycle 4 when the ta is reasserted. note that drtry is useful for systems that implement predicted forwarding of data such as those with direct-mapped, second-level caches where hit/miss is determined on the following bus clock cycle, or for parity- or ecc-checked memory systems. note that drtry may not be implemented on other powerpc processors. 12 34 5 t s q u a l d b g d b b data ta drtry
motorola chapter 8. system interface operation 8-29 8.4.4.2 data transfer termination due to a bus error the tea signal indicates that a bus error occurred. it may be asserted while dbb (and/or drtry for read operations) is asserted. asserting tea to the 603e terminates the transaction; that is, further assertions of ta and drtry are ignored and dbb is negated; see figure 8-13. figure 8-13. read burst with ta wait states and drtry assertion of the tea signal causes a machine check exception (and possibly a checkstop condition within the 603e). for more information, see section 4.5.2, ?achine check exception (0x00200).?note also that the 603e does not implement a synchronous error capability for memory accesses. this means that the exception instruction pointer does not point to the memory operation that caused the assertion of tea , but to the instruction about to be executed (perhaps several instructions later). however, assertion of tea does not invalidate data entering the gpr or the cache. additionally, the corresponding address of the access that caused tea to be asserted is not latched by the 603e. to recover, the exception handler must determine and remedy the cause of the tea , or the 603e must be reset; therefore, this function should only be used to ?g fatal system conditions to the processor (such as parity or uncorrectable ecc errors). after the 603e has committed to run a transaction, that transaction must eventually complete. address retry causes the transaction to be restarted; ta wait states and drtry assertion for reads delay termination of individual data beats. eventually, however, the system must either terminate the transaction or assert the tea signal (and vector the 603e into a machine check exception.) for this reason, care must be taken to check for the end of physical memory and the location of certain system facilities to avoid memory accesses that result in the generation of machine check exceptions. t s q u a l d b g d b b data ta drtry 123456789
8-30 mpc603e & EC603E risc microprocessors user's manual motorola note that tea generates a machine check exception depending on the me bit in the msr. clearing the machine check exception enable control bits leads to a true checkstop condition (instruction execution halted and processor clock stopped). 8.4.5 memory coherency?ei protocol the 603e provides dedicated hardware to provide memory coherency by snooping bus transactions. the address retry capability enforces the three-state, mei cache-coherency protocol (see figure 8-14). the global (gbl ) output signal indicates whether the current transaction must be snooped by other snooping devices on the bus. address bus masters assert gbl to indicate that the current transaction is a global access (that is, an access to memory shared by more than one device). if gbl is not asserted for the transaction, that transaction is not snooped. when other devices detect the gbl input asserted, they must respond by snooping the broadcast address. normally, gbl re?cts the m bit value speci?d for the memory reference in the corresponding translation descriptor(s). note that care must be taken to minimize the number of pages marked as global, because the retry protocol discussed in the previous section is used to enforce coherency and can require signi?ant bus bandwidth. when the 603e is not the address bus master, gbl is an input. the 603e snoops a transaction if ts and gbl are asserted together in the same bus clock cycle (this is a quali?d snooping condition). no snoop update to the 603e cache occurs if the snooped transaction is not marked global. this includes invalidation cycles. when the 603e detects a quali?d snoop condition, the address associated with the ts is compared against the data cache tags. snooping completes if no hit is detected. if, however, the address hits in the cache, the 603e reacts according to the mei protocol shown in figure 8-14, assuming the wim bits are set to write-back, caching-allowed, and coherency- enforced modes (wim = 001). the 603e's on-chip data cache is implemented as a four-way set-associative cache. to facilitate external monitoring of the internal cache tags, the cache set entry (cse[0?]) signals indicate which cache set is being replaced on read operations. note that these signals are valid only for 603e burst operations; for all other bus operations, the cse[0?] signals should be ignored.
motorola chapter 8. system interface operation 8-31 figure 8-14. mei cache coherency protocol?tate diagram (wim = 001) table 8-9 shows the cse encodings. table 8-9. cse[0?] signals cse[0?] cache set element 00 set 0 01 set 1 10 set 2 11 set 3 bus transactions sh = snoop hit rh = read hit wh = write hit wm = write miss rm = read miss sh/crw = snoop hit, cacheable read/write sh/cir = snoop hit, cache inhibited read rh wh rh modified wh sh sh/cir sh/crw wm exclusive invalid sh/crw rm = snoop push = cache line fill
8-32 mpc603e & EC603E risc microprocessors user's manual motorola 8.5 timing examples this section shows timing diagrams for various scenarios. figure 8-15 illustrates the fastest single-beat reads possible for the 603e. this ?ure shows both minimal latency and maximum single-beat throughput. by delaying the data bus tenure, the latency increases, but, because of split-transaction pipelining, the overall throughput is not affected unless the data bus latency causes the third address tenure to be delayed. note that all bidirectional signals are three-stated between bus tenures. figure 8-15. fastest single-beat reads br bg abb ts a[0?1] tt[0?] tbst gbl aack artry dbg dbb d[0?3] ta drtry tea 123456789101112 123456789101112 cpu a cpu a cpu a read read read in in in
motorola chapter 8. system interface operation 8-33 figure 8-16 illustrates the fastest single-beat writes supported by the 603e. all bidirectional signals are three-stated between bus tenures. figure 8-16. fastest single-beat writes br bg abb ts a[0?1] tt[0?] tbst gbl aack artry dbg dbb d[0?3] ta drtry tea 123456789101112 123456789101112 cpu a cpu a cpu a sbw sbw sbw out out out
8-34 mpc603e & EC603E risc microprocessors user's manual motorola figure 8-17 shows three ways to delay single-beat reads showing data-delay controls: the t a signal can remain negated to insert wait states in clock cycles 3 and 4. for the second access, dbg could have been asserted in clock cycle 6. in the third access, dr tr y is asserted in clock cycle 11 to ?sh the previous data. note that all bidirectional signals are three-stated between bus tenures. the pipelining shown in figure 8-17 can occur if the second access is not another load (for example, an instruction fetch). figure 8-17. single-beat reads showing data-delay controls br bg abb ts a[0?1] tt[0?] tbst gbl aack artry dbg dbb d[0?3] ta drtry tea cpu a cpu a cpu a read read read in in bad 1234567891011121314 1234567891011121314 in
motorola chapter 8. system interface operation 8-35 figure 8-18 shows data-delay controls in a single-beat write operation. note that all bidirectional signals are three-stated between bus tenures. data transfers are delayed in the following ways: the t a signal is held negated to insert wait states in clocks 3 and 4. in clock 6, dbg is held negated, delaying the start of the data tenure. the last access is not delayed (drtry is valid only for read operations). figure 8-18. single-beat writes showing data delay controls br bg abb ts a[0?1] tt[0?] tbst gbl aack artry dbg dbb d[0?3] ta drtry tea 123456789101112 123456789101112 cpu a cpu a cpu a sbw sbw sbw out out out
8-36 mpc603e & EC603E risc microprocessors user's manual motorola figure 8-19 shows the use of data-delay controls with burst transfers. note that all bidirectional signals are three-stated between bus tenures. note the following: the ?st data beat of bursted read data (clock 0) is the critical quad word. the write burst shows the use of t a signal negation to delay the third data beat. the ?al read burst shows the use of dr tr y on the third data beat. the address for the third transfer is delayed until the ?st transfer completes. figure 8-19. burst transfers with data delay controls br bg abb ts a[0?1] tt[0?] tbst gbl aack artry dbg dbb d[0?3] ta drtry tea cpu a in 0 1 2 3 4 5 6 7 8 9 1011121314151617181920 1 2 3 4 5 6 7 8 9 1011121314151617181920 cpu a cpu a read write read in 1 in 2 in 3 out 0 out 1 out 2 out 3 in 0 in 1 in 2 in 3 in 2
motorola chapter 8. system interface operation 8-37 figure 8-20 shows the use of the tea signal. note that all bidirectional signals are three- stated between bus tenures. note the following: the ?st data beat of the read burst (in clock 0) is the critical quad word. the tea signal truncates the burst write transfer on the third data beat. the 603e eventually causes an exception to be taken on the tea event. figure 8-20. use of transfer error acknowledge (tea ) br bg abb ts a[0?1] tt[0?] tbst gbl aack artry dbg dbb d[0?3] ta drtry tea cpu a in 0 1 2 3 4 5 6 7 8 9 1011121314151617 cpu a cpu a read write read in 1 in 2 in 3 out 0 out 1 out 2 in 0 in 1 in 3 in 2 1 2 3 4 5 6 7 8 9 1011121314151617
8-38 mpc603e & EC603E risc microprocessors user's manual motorola 8.6 optional bus con?urations the 603e supports three optional bus con?urations that are selected by the assertion or negation of drtry , tlbisync , and qack signals during the negation of the hreset signal. the operation and selection of the optional bus con?urations are described in the following sections. 8.6.1 32-bit data bus mode the 603e supports an optional 32-bit data bus mode. the 32-bit data bus mode operates the same as the 64-bit data bus mode with the exception of the byte lanes involved in the transfer and the number of data beats that are performed. when in 32-bit data bus mode, only byte lanes 0 through 3 are used corresponding to dh0?h31 and dp0?p3. byte lanes 4 through 7 corresponding to dl0?l31 and dp4?p7 are never used in this mode. the unused data bus signals are not sampled by the 603e during read operations, and they are driven low during write operations. the number of data beats required for a data tenure in the 32-bit data bus mode is one, two, or eight beats depending on the size of the program transaction and the cache mode for the address. data transactions of one or two data beats are performed for caching-inhibited load/store or write-through store operations. these transactions do not assert the tbst signal even though a two-beat burst may be performed (having the same tbst and tsiz[0?] encodings as the 64-bit data bus mode). single-beat data transactions are performed for bus operations of 4 bytes or less, and double-beat data transactions are performed for 8-byte operations only. the 603e only generates an 8-byte operation for a double-word-aligned load or store double operation to or from the ?ating-point gprs (not supported on the EC603E microprocessor). all cache-inhibited instruction fetches are performed as word (single-beat) operations. data transactions of eight data beats are performed for burst operations that load into or store from the 603es internal caches. these transactions transfer 32 bytes in the same way as in 64-bit data bus mode, asserting the tbst signal, and signaling a transfer size of 2 (tsiz(0?) = 0b010). the same bus protocols apply for arbitration, transfer, and termination of the address and data tenures in the 32-bit data bus mode as they apply to the 64-bit data bus mode. late artry cancellation of the data tenure applies on the bus clock after the first data beat is acknowledged (after the ?st ta ) for word or smaller transactions, or on the bus clock after the second data beat is acknowledged (after the second ta ) for double-word or burst operations (or coincident with respective ta if no-drtry mode is selected). an example of an eight-beat data transfer while the 603e is in 32-bit data bus mode is shown in figure 8-21.
motorola chapter 8. system interface operation 8-39 figure 8-21. 32-bit data bus transfer (eight-beat burst) an example of a two-beat data transfer (with drtry asserted during each data tenure) is shown in figure 8-22. figure 8-22. 32-bit data bus transfer (two-beat burst with drtry ) the 603e selects 64-bit or 32-bit data bus mode at startup by sampling the state of the tlbisync signal at the negation of hreset . if the tlbisync signal is negated at the negation of hreset , 64-bit data mode is entered by the 603e. if tlbisync is asserted at the negation of hreset , 32-bit data mode is entered. ts abb addr tbst aa ck ar tr y dbb d h[0?1] t a dr tr y tea 01234567 ts abb addr tbst aa ck ar tr y dbb dh[0?1] t a dr tr y tea 01
8-40 mpc603e & EC603E risc microprocessors user's manual motorola 8.6.2 no- drtry mode the 603e supports an optional mode to disable the use of the data retry function provided through the drtry signal. the no-drtry mode allows the forwarding of data during load operations to the internal cpu one bus cycle sooner than in the normal bus protocol. the powerpc bus protocol speci?s that, during load operations, the memory system normally has the capability to cancel data that was read by the master on the bus cycle after ta was asserted. in the 603e implementation, this late cancellation protocol requires the 603e to hold any loaded data at the bus interface for one additional bus clock to verify that the data is valid before forwarding it to the internal cpu. for systems that do not implement the drtry function, the 603e provides an optional no-drtry mode that eliminates this one-cycle stall during all load operations, and allows for the forwarding of data to the internal cpu immediately when ta is recognized. when the 603e is in the no-drtry mode, data can no longer be cancelled the cycle after it is acknowledged by an assertion of ta . data is immediately forwarded to the cpu internally, and any attempt at late cancellation by the system may cause improper operation by the 603e. when the 603e is following normal bus protocol, data may be cancelled the bus cycle after ta by either of two means?ate cancellation by drtry , or late cancellation by artry . when no-drtry mode is selected, both cancellation cases must be disallowed in the system design for the bus protocol. when no-drtry mode is selected for the 603e, the system must ensure that drtry will not be asserted to the 603e. if it is asserted, it may cause improper operation of the bus interface. the system must also ensure that an assertion of artry by a snooping device must occur before or coincident with the ?st assertion of ta to the 603e, but not on the cycle after the ?st assertion of ta . other than the inability to cancel data that was read by the master on the bus cycle after ta was asserted, the bus protocol for the 603e is identical to that for the basic transfer bus protocols described in this chapter, as well as for 32-bit data bus mode. the 603e selects the desired drtry mode at startup by sampling the state of the drtry signal itself at the negation of the hreset signal. if the drtr y signal is negated at the negation of hreset , normal operation is selected. if the drtry signal is asserted at the negation of hreset , no-drtry mode is selected. 8.6.3 reduced-pinout mode the 603e provides an optional reduced-pinout mode. this mode idles the switching of numerous signals for reduced power consumption. the dl[0?1], dp[0?], ap[0?], ape , dpe , and rsrv signals are disabled when the reduced-pinout mode is selected. note that the 32-bit data bus mode is implicitly selected when the reduced-pinout mode is enabled.
motorola chapter 8. system interface operation 8-41 when in the reduced-pinout mode, the bidirectional and output signals disabled are always driven low during the periods when they would normally have been driven by the 603e. the open-drain outputs (ape and dpe ) are always three-stated. the bidirectional inputs are always turned-off at the input receivers of the 603e and are not sampled. the 603e selects either full-pinout or reduced-pinout mode at startup by sampling the state of the qack signal at the negation of hreset . if the qack signal is asserted at the negation of hreset , full-pinout mode is selected by the 603e. if qack is negated at the negation of hreset , reduced-pinout mode is selected. 8.7 interrupt, checkstop, and reset signals this section describes external interrupts, checkstop operations, and hard and soft reset inputs. 8.7.1 external interrupts the external interrupt input signals (in t , s mi and mcp ) of the 603e eventually force the processor to take the external interrupt vector, or the system management interrupt vector if the msr[ee] is set, or the machine check interrupt if the msr[me] bit and the hid0[emcp] bit are set. 8.7.2 checkstops the 603e has two checkstop input signals?kstp _in (non-maskable) and mcp (enabled when msr[me] is cleared, and hid0[emcp] is set), and a checkstop output (ckstp_out ). if ckstp_in or mcp is asserted, the 603e halts operations by gating off all internal clocks. the 603e asserts ckstp_out if ckstp_in is asserted. if checkstop is asserted by the 603e, it has entered the checkstop state, and processing has halted internally. the checkstop signal can be asserted for various reasons including receiving a tea signal and detection of external parity errors. for more information about checkstop state, see section 4.5.2.2, ?heckstop state (msr[me] = 0). 8.7.3 reset inputs the 603e has two reset inputs, described as follows: hreset (hard reset)?he hreset signal is used for power-on reset sequences, or for situations in which the 603e must go through the entire cold-start sequence of internal hardware initializations. sreset (soft reset)?he soft reset input provides warm reset capability. this input can be used to avoid forcing the 603e to complete the cold start sequence. when either reset input is negated, the processor attempts to fetch code from the system reset exception vector. the vector is located at offset 0x00100 from the exception pre? (all zeros or ones, depending on the setting of the exception pre? bit in the machine state register (msr[ip]). the ip bit is set for hreset .
8-42 mpc603e & EC603E risc microprocessors user's manual motorola 8.7.4 system quiesce control signals the system quiesce control signals (qreq and qack ) allow the processor to enter a low power state, and bring bus activity to a quiescent state in an orderly fashion. the system quiesce state is entered by asserting the qreq signal. this signal allows the system to terminate or pause any bus activities that are normally snooped. when the system is ready to enter the system quiesce state, it asserts the qack signal. at this time the 603e may enter a quiescent (low power) state. when the 603e is in the quiescent state, it stops snooping bus activity. 8.8 processor state signals this section describes the 603e's support for atomic update and memory through the use of the lwarx / stwcx . opcode pair, and includes a description of the 603e tlbisync input. 8.8.1 support for the lwarx/stwcx. instruction pair the load word and reserve indexed ( lwarx ) and the store word conditional indexed ( stwcx .) instructions provide a means for atomic memory updating. memory can be updated atomically by setting a reservation on the load and checking that the reservation is still valid before the store is performed. in the 603e, the reservations are made on behalf of aligned, 32-byte sections of the memory address space. the reservation (rsrv ) output signal is driven synchronously with the bus clock and re?cts the status of the reservation coherency bit in the reservation address register (see chapter 3, ?nstruction and data cache operation,?for more information). see section 7.2.9.7.3, ?eservation (rsrv)?utput,?for information about timing. 8.8.2 tlbisync input the tlbisync input allows for the hardware synchronization of changes to mmu tables when the 603e and another dma master share the same mmu translation tables in system memory. it is asserted by a dma master when it is using shared addresses that could be changed in the mmu tables by the 603e during the dma masters tenure. the tlbisync input, when asserted to the 603e, prevents the 603e from completing any instructions past a tlbsync instruction. generally, during the execution of an eciwx or ecowx instruction by the 603e, the selected dma device should assert the 603es tlbisyn c signal and maintain it asserted during its dma tenure if it is using a shared translation address. subsequent instructions by the 603e should include a sync and tlbsync instruction before any mmu table changes are performed. this will prevent the 603e from making table changes disruptive to the other master during the dma period.
motorola chapter 8. system interface operation 8-43 8.9 ieee 1149.1-compliant interface the 603e boundary-scan interface is a fully-compliant implementation of the ieee 1149.1 standard. this section describes the 603e ieee 1149.1(jtag) interface. 8.9.1 ieee 1149.1 interface description the 603e has five dedicated jtag signals which are described in table 8-10. the tdi and tdo scan ports are used to scan instructions as well as data into the various scan registers for jtag operations. the scan operation is controlled by the test access port (tap) controller which in turn is controlled by the tms input sequence. the scan data is latched in at the rising edge of tck. trst is a jtag optional signal which is used to reset the tap controller asynchronously. the trst signal assures that the jtag logic does not interfere with the normal operation of the chip, and can be asserted coincident with the hreset . the pid7v-603e implements the jtag/cop in the same manner as does the pid6-603e implementation with the exception of the introduction of the 33-bit run_n counter register in which the most-signi?ant 32 bits form a 32-bit counter. the function of the least- signi?ant bit remains unchanged. the run_n counter is used by the cop to control the number of processor cycles that the processor runs before halting. 8.10 using data bus write only the 603e supports split-transaction pipelined transactions. it supports a limited out-of- order capability for its own pipelined transactions through the data bus write only (dbwo ) signal. when recognized on the clock of a quali?d dbg , the assertion of dbwo directs the 603e to perform the next pending data write tenure (if any), even if a pending read tenure would have normally been performed because of address pipelining. the d bwo signal does not change the order of write tenures with respect to other write tenures from the same 603e. it only allows that a write tenure be performed ahead of a pending read tenure from the same 603e. in general, an address tenure on the bus is followed strictly in order by its associated data tenure. transactions pipelined by the 603e complete strictly in order. however, the 603e table 8-10. ieee interface pin descriptions signal name input/output weak pullup provided ieee 1149.1 function tdi input yes serial scan input signal tdo output no serial scan output signal tms input yes tap controller mode signal tck input yes scan clock trst input yes tap controller reset
8-44 mpc603e & EC603E risc microprocessors user's manual motorola can run bus transactions out of order only when the external system allows the 603e to perform a cache-line-snoop-push-out operation (or other write transaction, if pending in the 603e write queues) between the address and data tenures of a read operation through the use of dbwo . this effectively envelopes the write operation within the read operation. figure 8-23 shows how the dbwo signal is used to perform an enveloped write transaction. figure 8-23. data bus write only transaction note that although the 603e can pipeline any write transaction behind the read transaction, special care should be used when using the enveloped write feature. it is envisioned that most system implementations will not need this capability; for these applications, dbwo should remain negated. in systems where this capability is needed, dbwo should be asserted under the following scenario: 1. the 603e initiates a read transaction (either single-beat or burst) by completing the read address tenure with no address retry. 2. then, the 603e initiates a write transaction by completing the write address tenure, with no address retry. 3. at this point, if dbw o is asserted with a quali?d data bus grant to the 603e, the 603e asserts dbb and drives the write data onto the data bus, out of order with respect to the address pipeline. the write transaction concludes with the 603e negating dbb . 4. the next quali?d data bus grant signals the 603e to complete the outstanding read transaction by latching the data on the bus. this assertion of dbg should not be accompanied by an asserted dbw o . any number of bus transactions by other bus masters can be attempted between any of these steps. aa ck dbg abb bg (2) (1) dbb enveloped write dbw o transaction (1) (2) read address write address write data read data
motorola chapter 8. system interface operation 8-45 note the following regarding dbwo : dbw o can be asserted if no data bus read is pending, but it has no effect on write ordering. the ordering and presence of data bus writes is determined by the writes in the write queues at the time bg is asserted for the write address (not dbg ). if a particular write is desired (for example, a cache-line-snoop-push-out operation), then bg must be asserted after that particular write is in the queue and it must be the highest priority write in the queue at that time. a cache-line-snoop-push-out operations may be the highest priority write, but more than one may be queued. because more than one write may be in the write queue when dbg is asserted for the write address, more than one data bus write may be enveloped by a pending data bus read. the arbiter must monitor bus operations and coordinate the various masters and slaves with respect to the use of the data bus when dbwo is used. individual dbg signals associated with each bus device should allow the arbiter to synchronize both pipelined and split- transaction bus organizations. individual dbg and dbwo signals provide a primitive form of source-level tagging for the granting of the data bus. note that use of the dbwo signal allows some operation-level tagging with respect to the 603e and the use of the data bus.
8-46 mpc603e & EC603E risc microprocessors user's manual motorola
motorola chapter 9. power management 9-1 chapter 9 power management 90 90 the powerpc 603e microprocessor is the ?st microprocessor speci?ally designed for low-power operation. the 603e provides both automatic and program-controllable power reduction modes for progressive reduction of power consumption. this chapter describes the hardware support provided by the 603e for power management. 9.1 dynamic power management dynamic power management automatically powers up and down the individual execution units of the 603e, based upon the contents of the instruction stream. for example, if no ?ating-point instructions are being executed, the ?ating-point unit is automatically powered down. power is not actually removed from the execution unit; instead, each execution unit has an independent clock input, which is automatically controlled on a clock-by-clock basis. since cmos circuits consume negligible power when they are not switching, stopping the clock to an execution unit effectively eliminates its power consumption. the operation of dpm is completely transparent to software or any external hardware. dynamic power management is enabled by setting bit 11 in hid0 on power-up, or following hreset . 9.2 programmable power modes the 603e provides four power modes selectable by setting the appropriate control bits in the machine state register (msr) and hardware implementation register 0 (hid0) registers. the four power modes are described brie? as follows: full-power?his is the default power state of the 603e. the 603e is fully powered and the internal functional units are operating at the full processor clock speed. if the dynamic power management mode is enabled, functional units that are idle will automatically enter a low-power state without affecting performance, software execution, or external hardware. doze?ll the functional units of the 603e are disabled except for the time base/ decrementer registers and the bus snooping logic. when the processor is in doze mode, an external asynchronous interrupt, a system management interrupt, a decrementer exception, a hard or soft reset, or machine check brings the 603e into
9-2 mpc603e & EC603E risc microprocessors user's manual motorola the full-power state. the 603e in doze mode maintains the pll in a fully powered state and locked to the system external clock input (sysclk) so a transition to the full-power state takes only a few processor clock cycles. nap?he nap mode further reduces power consumption by disabling bus snooping, leaving only the time base register and the pll in a powered state. the 603e returns to the full-power state upon receipt of an external asynchronous interrupt, a system management interrupt, a decrementer exception, a hard or soft reset, or a machine check input (mcp ) signal. a return to full-power state from a nap state takes only a few processor clock cycles. sleep?leep mode reduces power consumption to a minimum by disabling all internal functional units, after which external system logic may disable the pll and sysclk. returning the 603e to the full-power state requires the enabling of the pll and sysclk, followed by the assertion of an external asynchronous interrupt, a system management interrupt, a hard or soft reset, or a machine check input (mcp ) signal after the time required to relock the pll. the pid7v-603e implementation offers the following enhancements to the 603e family: lower-power design 2.5-volt core and 3.3-volt i/o hardware can enable a power management state through external asynchronous interrupts. the hardware interrupt causes the transfer of program flow to interrupt handler code. the appropriate mode is then set by the software. the 603e provides a separate interrupt and interrupt vector for power management?he system management interrupt (smi). the 603e also contains a decrement timer which allows it to enter the nap or doze mode for a predetermined amount of time and then return to full power operation through the decrementer interrupt exception. note that the 603e cannot switch from one power management mode to another without ?st returning to full-on mode. the nap and sleep modes disable bus snooping; therefore, a hardware handshake is provided to ensure coherency before the 603e enters these power management modes. table 9-1 summarizes the four power states.
motorola chapter 9. power management 9-3 9.2.1 power management modes the following sections describe the characteristics of the 603es power management modes, the requirements for entering and exiting the various modes, and the system capabilities provided by the 603e while the power management modes are active. 9.2.1.1 full-power mode with dpm disabled full-power mode with dpm disabled power mode is selected when the dpm enable bit (bit 11) in hid0 is cleared. default state following power-up and hreset all functional units are operating at full processor speed at all times 9.2.1.2 full-power mode with dpm enabled full-power mode with dpm enabled (hid0[11] = 1) provides on-chip power management without affecting the functionality or performance of the 603e. required functional units are operating at full processor speed functional units are clocked only when needed no software or hardware intervention required after mode is set software/hardware and performance transparent table 9-1. programmable power modes pm mode functioning units activation method full-power wake up method full power all units active full power (with dpm) requested logic by demand by instruction dispatch doze bus snooping ?data cache as needed ?decrementer timer controlled by sw external asynchronous exceptions* decrementer interrupt reset nap decrementer timer controlled by hardware and software external asynchronous exceptions decrementer interrupt reset sleep none controlled by hardware and software external asynchronous exceptions reset
9-4 mpc603e & EC603E risc microprocessors user's manual motorola 9.2.1.3 doze mode doze mode disables most functional units but maintains cache coherency by enabling the bus interface unit and snooping. a snoop hit will cause the 603e to enable the data cache, copy the data back to memory, disable the cache, and fully return to the doze state. most functional units disabled bus snooping and time base/decrementer still enabled doze mode sequence set doze bit (hid0[8] = 1) 603e enters doze mode after several processor clocks several methods of returning to full-power mode assert int , smi , mcp or decrementer interrupts assert hard reset or soft reset transition to full-power state takes no more than a few processor cycles pll running and locked to sysclk 9.2.1.4 nap mode the nap mode disables the 603e but still maintains the phase-locked loop (pll) and the time base/decrementer. the time base can be used to restore the 603e to full-on state after a programmed amount of time. because bus snooping is disabled for nap and sleep mode, a hardware handshake using the quiesce request (qreq ) and quiesce acknowledge (qa ck ) signals are required to maintain data coherency. the 603e will assert the qreq signal to indicate that it is ready to disable bus snooping. when the system has ensured that snooping is no longer necessary, it will assert qa ck and the 603e will enter the sleep or nap mode. time base/decrementer still enabled most functional units disabled (including bus snooping) all nonessential input receivers disabled nap mode sequence set nap bit (hid0[9] = 1) 603e asserts quiesce request (qreq ) signal system asserts quiesce acknowledge (qa ck ) signal 603e enters sleep mode after several processor clocks several methods of returning to full-power mode assert int , smi , mcp or decrementer interrupts assert hard reset or soft reset transition to full-power takes no more than a few processor cycles pll running and locked to sysclk
motorola chapter 9. power management 9-5 9.2.1.5 sleep mode sleep mode consumes the least amount of power of the four modes since all functional units are disabled. to conserve the maximum amount of power, the pll may be disabled and the sysclk may be removed. due to the fully static design of the 603e, internal processor state is preserved when no internal clock is present. because the time base and decrementer are disabled while the 603e is in sleep mode, the 603es time base contents will have to be updated from an external time base following sleep mode if accurate time-of-day maintenance is required. before the 603e enters the sleep mode, the 603e will assert the qreq signal to indicate that it is ready to disable bus snooping. when the system has ensured that snooping is no longer necessary, it will assert qa ck and the 603e will enter the sleep mode. all functional units disabled (including bus snooping and time base) all nonessential input receivers disabled internal clock regenerators disabled pll still running (see below) sleep mode sequence set sleep bit (hid0[10] = 1) 603e asserts quiesce request (qreq ) system asserts quiesce acknowledge (qa ck ) 603e enters sleep mode after several processor clocks several methods of returning to full-power mode assert int , smi or mcp interrupts assert hard reset or soft reset pll may be disabled and sysclk may be removed while in sleep mode return to full-power mode after pll and sysclk disabled in sleep mode enable sysclk recon?ure pll into desired processor clock mode system logic waits for pll startup and relock time (100 m sec) system logic asserts one of the sleep recovery signals (for example, int or smi)
9-6 mpc603e & EC603E risc microprocessors user's manual motorola 9.2.2 power management software considerations since the 603e is a dual issue processor with out-of-order execution capability, care must be taken in how the power management mode is entered. furthermore, nap and sleep modes require all outstanding bus operations to be completed before the power management mode is entered. normally during system con?uration time, one of the power management modes would be selected by setting the appropriate hid0 mode bit. later on, the power management mode is invoked by setting the msr[pow] bit. to ensure a clean transition into and out of the power management mode, set the msr[ee] bit and execute the following code sequence: sync mtmsr [pow = 1] isync loop: b loop
motorola appendix a. powerpc instruction set listings a-1 appendix a powerpc instruction set listings a0 a0 this appendix lists the powerpc 603e microprocessors instruction set as well as the additional powerpc instructions not implemented in the 603e. instructions are sorted by mnemonic, opcode, function, and form. also included in this appendix is a quick reference table that contains general information, such as the architecture level, privilege level, and form, and indicates if the instruction is 64-bit and optional. note that split fields, that represent the concatenation of sequences from left to right, are shown in lowercase. for more information refer to chapter 8, ?nstruction set,?in the programming environments manual. a.1 instructions sorted by mnemonic table a-1 lists the instructions implemented in the powerpc architecture in alphabetical order by mnemonic. table a-1. complete instruction list sorted by mnemonic name 0 678910111213141516171819202122232425262728293031 add x 31 d a b oe 266 rc addc x 31 d a b oe 10 rc adde x 31 d a b oe 138 rc addi 14 d a simm addic 12 d a simm addic. 13 d a simm addis 15 d a simm addme x 31 d a 0 0 0 0 0 oe 234 rc addze x 31 d a 0 0 0 0 0 oe 202 rc and x 31 s a b 28 rc andc x 31 s a b 60 rc reserved bits key: instruction not implemented in the 603e
a-2 mpc603e & EC603E risc microprocessors users manual motorola andi. 28 s a uimm andis. 29 s a uimm b x 18 li aa lk bc x 16 bo bi bd aa lk bcctr x 19 bo bi 0 0 0 0 0 528 lk bclr x 19 bo bi 0 0 0 0 0 16 lk cmp 31 crfd 0l a b 0 0 cmpi 11 crfd 0 l a simm cmpl 31 crfd 0l a b 32 0 cmpli 10 crfd 0 l a uimm cntlzd x 4 31 s a 0 0 0 0 0 58 rc cntlzw x 31 s a 0 0 0 0 0 26 rc crand 19 crbd crba crbb 257 0 crandc 19 crbd crba crbb 129 0 creqv 19 crbd crba crbb 289 0 crnand 19 crbd crba crbb 225 0 crnor 19 crbd crba crbb 33 0 cror 19 crbd crba crbb 449 0 crorc 19 crbd crba crbb 417 0 crxor 19 crbd crba crbb 193 0 dcbf 31 0 0 0 0 0 a b 86 0 dcbi 1 31 0 0 0 0 0 a b 470 0 dcbst 31 0 0 0 0 0 a b 54 0 dcbt 31 0 0 0 0 0 a b 278 0 dcbtst 31 0 0 0 0 0 a b 246 0 dcbz 31 0 0 0 0 0 a b 1014 0 divd x 4 31 d a b oe 489 rc divdu x 4 31 d a b oe 457 rc divw x 31 d a b oe 491 rc divwu x 31 d a b oe 459 rc eciwx 31 d a b 310 0 ecowx 31 s a b 438 0 eieio 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 854 0 name 0 678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-3 eqv x 31 s a b 284 rc extsb x 31 s a 0 0 0 0 0 954 rc extsh x 31 s a 0 0 0 0 0 922 rc extsw x 4 31 s a 0 0 0 0 0 986 rc fabs x 7 63 d 0 0 0 0 0 b 264 rc fadd x 63 d a b 0 0 0 0 0 21 rc fadds x 7 59 d a b 0 0 0 0 0 21 rc fcfid x 4,7 63 d 0 0 0 0 0 b 846 rc fcmpo 7 63 crfd 0 0 a b 32 0 fcmpu 7 63 crfd 0 0 a b 0 0 fctid x 4,7 63 d 0 0 0 0 0 b 814 rc fctidz x 4,7 63 d 0 0 0 0 0 b 815 rc fctiw x 7 63 d 0 0 0 0 0 b 14 rc fctiwz x 7 63 d 0 0 0 0 0 b 15 rc fdiv x 7 63 d a b 0 0 0 0 0 18 rc fdivs x 7 59 d a b 0 0 0 0 0 18 rc fmadd x 7 63 d a b c 29 rc fmadds x 7 59 d a b c 29 rc fmr x 7 63 d 0 0 0 0 0 b 72 rc fmsub x 7 63 d a b c 28 rc fmsubs x 7 59 d a b c 28 rc fmul x 7 63 d a 0 0 0 0 0 c 25 rc fmuls x 7 59 d a 0 0 0 0 0 c 25 rc fnabs x 7 63 d 0 0 0 0 0 b 136 rc fneg x 7 63 d 0 0 0 0 0 b 40 rc fnmadd x 7 63 d a b c 31 rc fnmadds x 7 59 d a b c 31 rc fnmsub x 7 63 d a b c 30 rc fnmsubs x 7 59 d a b c 30 rc fres x 5,7 59 d 0 0 0 0 0 b 0 0 0 0 0 24 rc frsp x 7 63 d 0 0 0 0 0 b 12 rc frsqrte x 5,7 63 d 0 0 0 0 0 b 0 0 0 0 0 26 rc fsel x 5,7 63 d a b c 23 rc name 0 678910111213141516171819202122232425262728293031
a-4 mpc603e & EC603E risc microprocessors users manual motorola fsqrt x 5,7 63 d 0 0 0 0 0 b 0 0 0 0 0 22 rc fsqrts x 5,7 59 d 0 0 0 0 0 b 0 0 0 0 0 22 rc fsub x 7 63 d a b 0 0 0 0 0 20 rc fsubs x 7 59 d a b 0 0 0 0 0 20 rc icbi 31 0 0 0 0 0 a b 982 0 isync 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 150 0 lbz 34 d a d lbzu 35 d a d lbzux 31 d a b 119 0 lbzx 31 d a b 87 0 ld 4 58 d a ds 0 ldarx 4 31 d a b 84 0 ldu 4 58 d a ds 1 ldux 4 31 d a b 53 0 ldx 4 31 d a b 21 0 lfd 7 50 d a d lfdu 7 51 d a d lfdux 7 31 d a b 631 0 lfdx 7 31 d a b 599 0 lfs 7 48 d a d lfsu 7 49 d a d lfsux 7 31 d a b 567 0 lfsx 7 31 d a b 535 0 lha 42 d a d lhau 43 d a d lhaux 31 d a b 375 0 lhax 31 d a b 343 0 lhbrx 31 d a b 790 0 lhz 40 d a d lhzu 41 d a d lhzux 31 d a b 311 0 lhzx 31 d a b 279 0 lmw 3 46 d a d name 0 678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-5 lswi 3 31 d a nb 597 0 lswx 3 31 d a b 533 0 lwa 4 58 d a ds 2 lwarx 31 d a b 20 0 lwaux 4 31 d a b 373 0 lwax 4 31 d a b 341 0 lwbrx 31 d a b 534 0 lwz 32 d a d lwzu 33 d a d lwzux 31 d a b 55 0 lwzx 31 d a b 23 0 mcrf 19 crfd 0 0 crfs 0 0 0 0 0 0 0 0 0 mcrfs 7 63 crfd 0 0 crfs 0 0 0 0 0 0 0 64 0 mcrxr 31 crfd 0 0 0 0 0 0 0 0 0 0 0 0 512 0 mfcr 31 d 0 0 0 0 0 0 0 0 0 0 19 0 mffs x 7 63 d 0 0 0 0 0 0 0 0 0 0 583 rc mfmsr 1 31 d 0 0 0 0 0 0 0 0 0 0 83 0 mfspr 2 31 d spr 339 0 mfsr 1 31 d 0 sr 0 0 0 0 0 595 0 mfsrin 1 31 d 0 0 0 0 0 b 659 0 mftb 31 d tbr 371 0 mtcrf 31 s 0 crm 0 144 0 mtfsb0 x 7 63 crbd 0 0 0 0 0 0 0 0 0 0 70 rc mtfsb1 x 7 63 crbd 0 0 0 0 0 0 0 0 0 0 38 rc mtfsf x 7 63 0 fm 0 b 711 rc mtfsfi x 7 63 crfd 0 0 0 0 0 0 0 imm 0 134 rc mtmsr 1 31 s 0 0 0 0 0 0 0 0 0 0 146 0 mtspr 2 31 s spr 467 0 mtsr 1 31 s 0 sr 0 0 0 0 0 210 0 mtsrin 1 31 s 0 0 0 0 0 b 242 0 mulhd x 4 31 d a b 0 73 rc mulhdu x 4 31 d a b 0 9 rc mulhw x 31 d a b 075rc name 0 678910111213141516171819202122232425262728293031
a-6 mpc603e & EC603E risc microprocessors users manual motorola mulhwu x 31 d a b 011rc mulld x 4 31 d a b oe 233 rc mulli 7 d a simm mullw x 31 d a b oe 235 rc nand x 31 s a b 476 rc neg x 31 d a 0 0 0 0 0 oe 104 rc nor x 31 s a b 124 rc or x 31 s a b 444 rc orc x 31 s a b 412 rc ori 24 s a uimm oris 25 s a uimm rfi 1 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 0 rldcl x 4 30 s a b mb 8 rc rldcr x 4 30 s a b me 9 rc rldic x 4 30 s a sh mb 2 sh rc rldicl x 4 30 s a sh mb 0 sh rc rldicr x 4 30 s a sh me 1 sh rc rldimi x 4 30 s a sh mb 3 sh rc rlwimi x 20 s a sh mb me rc rlwinm x 21 s a sh mb me rc rlwnm x 23 s a b mb me rc sc 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 slbia 1,4,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 498 0 slbie 1,4,5 31 0 0 0 0 0 0 0 0 0 0 b 434 0 sld x 4 31 s a b 27 rc slw x 31 s a b 24 rc srad x 4 31 s a b 794 rc sradi x 4 31 s a sh 413 sh rc sraw x 31 s a b 792 rc srawi x 31 s a sh 824 rc srd x 4 31 s a b 539 rc srw x 31 s a b 536 rc stb 38 s a d name 0 678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-7 stbu 39 s a d stbux 31 s a b 247 0 stbx 31 s a b 215 0 std 4 62 s a ds 0 stdcx. 4 31 s a b 214 1 stdu 4 62 s a ds 1 stdux 4 31 s a b 181 0 stdx 4 31 s a b 149 0 stfd 54 s a d stfdu 55 s a d stfdux 31 s a b 759 0 stfdx 31 s a b 727 0 stfiwx 5 31 s a b 983 0 stfs 52 s a d stfsu 53 s a d stfsux 31 s a b 695 0 stfsx 31 s a b 663 0 sth 44 s a d sthbrx 31 s a b 918 0 sthu 45 s a d sthux 31 s a b 439 0 sthx 31 s a b 407 0 stmw 3 47 s a d stswi 3 31 s a nb 725 0 stswx 3 31 s a b 661 0 stw 36 s a d stwbrx 31 s a b 662 0 stwcx. 31 s a b 150 1 stwu 37 s a d stwux 31 s a b 183 0 stwx 31 s a b 151 0 subf x 31 d a b oe 40 rc subfc x 31 d a b oe 8 rc name 0 678910111213141516171819202122232425262728293031
a-8 mpc603e & EC603E risc microprocessors users manual motorola subfe x 31 d a b oe 136 rc subfic 08 d a simm subfme x 31 d a 0 0 0 0 0 oe 232 rc subfze x 31 d a 0 0 0 0 0 oe 200 rc sync 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 598 0 td 4 31 to a b 68 0 tdi 4 02 to a simm tlbia 1,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 370 0 tlbie 1,5 31 0 0 0 0 0 0 0 0 0 0 b 306 0 tlbld 1,6 31 0 0 0 0 0 0 0 0 0 0 b 978 0 tlbli 1,6 31 0 0 0 0 0 0 0 0 0 0 b 1010 0 tlbsync 1,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 566 0 tw 31 to a b 4 0 twi 03 to a simm xor x 31 s a b 316 rc xori 26 s a uimm xoris 27 s a uimm 1 supervisor-level instruction 2 supervisor- and user-level instruction 3 load and store string or multiple instruction 4 64-bit instruction 5 optional in the powerpc architecture 6 implementation-specific instruction 7 floating-point instructions are not supported by the EC603E microprocessor and are trapped by the ?ating-point unavailable exception vector. name 0 678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-9 a.2 instructions sorted by opcode table a-2 lists the instructions de?ed in the powerpc architecture in numeric order by opcode. table a-2. complete instruction list sorted by opcode name 0 5678910111213141516171819202122232425262728293031 tdi 4 0 0 0 0 1 0 to a simm twi 0 0 0 0 1 1 to a simm mulli 0 0 0 1 1 1 d a simm subfic 0 0 1 0 0 0 d a simm cmpli 0 0 1 0 1 0 crfd 0 l a uimm cmpi 0 0 1 0 1 1 crfd 0 l a simm addic 0 0 1 1 0 0 d a simm addic. 0 0 1 1 0 1 d a simm addi 0 0 1 1 1 0 d a simm addis 0 0 1 1 1 1 d a simm bc x 0 1 0 0 0 0 bo bi bd aa lk sc 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 b x 0 1 0 0 1 0 li aa lk mcrf 0 1 0 0 1 1 crfd 0 0 crfs 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 bclr x 0 1 0 0 1 1 bo bi 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 lk crnor 0 1 0 0 1 1 crbd crba crbb 0 0 0 0 1 0 0 0 0 1 0 rfi 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 crandc 0 1 0 0 1 1 crbd crba crbb 0 0 1 0 0 0 0 0 0 1 0 isync 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 crxor 0 1 0 0 1 1 crbd crba crbb 0 0 1 1 0 0 0 0 0 1 0 crnand 0 1 0 0 1 1 crbd crba crbb 0 0 1 1 1 0 0 0 0 1 0 crand 0 1 0 0 1 1 crbd crba crbb 0 1 0 0 0 0 0 0 0 1 0 creqv 0 1 0 0 1 1 crbd crba crbb 0 1 0 0 1 0 0 0 0 1 0 crorc 0 1 0 0 1 1 crbd crba crbb 0 1 1 0 1 0 0 0 0 1 0 cror 0 1 0 0 1 1 crbd crba crbb 0 1 1 1 0 0 0 0 0 1 0 bcctr x 0 1 0 0 1 1 bo bi 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 lk rlwimi x 0 1 0 1 0 0 s a sh mb me rc reserved bits key: instruction not implemented in the 603e
a-10 mpc603e & EC603E risc microprocessors user's manual motorola rlwinm x 0 1 0 1 0 1 s a sh mb me rc rlwnm x 0 1 0 1 1 1 s a b mb me rc ori 0 1 1 0 0 0 s a uimm oris 0 1 1 0 0 1 s a uimm xori 0 1 1 0 1 0 s a uimm xoris 0 1 1 0 1 1 s a uimm andi. 0 1 1 1 0 0 s a uimm andis. 0 1 1 1 0 1 s a uimm rldicl x 4 0 1 1 1 1 0 s a sh mb 0 0 0 sh rc rldicr x 4 0 1 1 1 1 0 s a sh me 0 0 1 sh rc rldic x 4 0 1 1 1 1 0 s a sh mb 0 1 0 sh rc rldimi x 4 0 1 1 1 1 0 s a sh mb 0 1 1 sh rc rldcl x 4 0 1 1 1 1 0 s a b mb 0 1 0 0 0 rc rldcr x 4 0 1 1 1 1 0 s a b me 0 1 0 0 1 rc cmp 0 1 1 1 1 1 crfd 0 l a b 0 0 0 0 0 0 0 0 0 0 0 tw 0 1 1 1 1 1 to a b 0 0 0 0 0 0 0 1 0 0 0 subfc x 0 1 1 1 1 1 d a b oe 0 0 0 0 0 0 1 0 0 0 rc mulhdu x 4 0 1 1 1 1 1 d a b 0 0 0 0 0 0 0 1 0 0 1 rc addc x 0 1 1 1 1 1 d a b oe 0 0 0 0 0 0 1 0 1 0 rc mulhwu x 0 1 1 1 1 1 d a b 0 0 0 0 0 0 0 1 0 1 1 rc mfcr 0 1 1 1 1 1 d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 lwarx 0 1 1 1 1 1 d a b 0 0 0 0 0 1 0 1 0 0 0 ldx 4 0 1 1 1 1 1 d a b 0 0 0 0 0 1 0 1 0 1 0 lwzx 0 1 1 1 1 1 d a b 0 0 0 0 0 1 0 1 1 1 0 slw x 0 1 1 1 1 1 s a b 0 0 0 0 0 1 1 0 0 0 rc cntlzw x 0 1 1 1 1 1 s a 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 rc sld x 4 0 1 1 1 1 1 s a b 0 0 0 0 0 1 1 0 1 1 rc and x 0 1 1 1 1 1 s a b 0 0 0 0 0 1 1 1 0 0 rc cmpl 0 1 1 1 1 1 crfd 0 l a b 0 0 0 0 1 0 0 0 0 0 0 subf x 0 1 1 1 1 1 d a b oe 0 0 0 0 1 0 1 0 0 0 rc ldux 4 0 1 1 1 1 1 d a b 0 0 0 0 1 1 0 1 0 1 0 dcbst 0 1 1 1 1 1 0 0 0 0 0 a b 0 0 0 0 1 1 0 1 1 0 0 lwzux 0 1 1 1 1 1 d a b 0 0 0 0 1 1 0 1 1 1 0 name 0 5678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-11 cntlzd x 4 0 1 1 1 1 1 s a 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 rc andc x 0 1 1 1 1 1 s a b 0 0 0 0 1 1 1 1 0 0 rc td 4 0 1 1 1 1 1 to a b 0 0 0 1 0 0 0 1 0 0 0 mulhd x 4 0 1 1 1 1 1 d a b 0 0 0 0 1 0 0 1 0 0 1 rc mulhw x 0 1 1 1 1 1 d a b 0 0 0 0 1 0 0 1 0 1 1 rc mfmsr 0 1 1 1 1 1 d 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 ldarx 4 0 1 1 1 1 1 d a b 0 0 0 1 0 1 0 1 0 0 0 dcbf 0 1 1 1 1 1 0 0 0 0 0 a b 0 0 0 1 0 1 0 1 1 0 0 lbzx 0 1 1 1 1 1 d a b 0 0 0 1 0 1 0 1 1 1 0 neg x 0 1 1 1 1 1 d a 0 0 0 0 0 oe 0 0 0 1 1 0 1 0 0 0 rc lbzux 0 1 1 1 1 1 d a b 0 0 0 1 1 1 0 1 1 1 0 nor x 0 1 1 1 1 1 s a b 0 0 0 1 1 1 1 1 0 0 rc subfe x 0 1 1 1 1 1 d a b oe 0 0 1 0 0 0 1 0 0 0 rc adde x 0 1 1 1 1 1 d a b oe 0 0 1 0 0 0 1 0 1 0 rc mtcrf 0 1 1 1 1 1 s 0 crm 0 0 0 1 0 0 1 0 0 0 0 0 mtmsr 0 1 1 1 1 1 s 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 stdx 4 0 1 1 1 1 1 s a b 0 0 1 0 0 1 0 1 0 1 0 stwcx. 0 1 1 1 1 1 s a b 0 0 1 0 0 1 0 1 1 0 1 stwx 0 1 1 1 1 1 s a b 0 0 1 0 0 1 0 1 1 1 0 stdux 4 0 1 1 1 1 1 s a b 0 0 1 0 1 1 0 1 0 1 0 stwux 0 1 1 1 1 1 s a b 0 0 1 0 1 1 0 1 1 1 0 subfze x 0 1 1 1 1 1 d a 0 0 0 0 0 oe 0 0 1 1 0 0 1 0 0 0 rc addze x 0 1 1 1 1 1 d a 0 0 0 0 0 oe 0 0 1 1 0 0 1 0 1 0 rc mtsr 0 1 1 1 1 1 s 0 sr 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 stdcx. 4 0 1 1 1 1 1 s a b 0 0 1 1 0 1 0 1 1 0 1 stbx 0 1 1 1 1 1 s a b 0 0 1 1 0 1 0 1 1 1 0 subfme x 0 1 1 1 1 1 d a 0 0 0 0 0 oe 0 0 1 1 1 0 1 0 0 0 rc mulld 4 0 1 1 1 1 1 d a b oe 0 0 1 1 1 0 1 0 0 1 rc addme x 0 1 1 1 1 1 d a 0 0 0 0 0 oe 0 0 1 1 1 0 1 0 1 0 rc mullw x 0 1 1 1 1 1 d a b oe 0 0 1 1 1 0 1 0 1 1 rc mtsrin 0 1 1 1 1 1 s 0 0 0 0 0 b 0 0 1 1 1 1 0 0 1 0 0 dcbtst 0 1 1 1 1 1 0 0 0 0 0 a b 0 0 1 1 1 1 0 1 1 0 0 stbux 0 1 1 1 1 1 s a b 0 0 1 1 1 1 0 1 1 1 0 name 0 5678910111213141516171819202122232425262728293031
a-12 mpc603e & EC603E risc microprocessors user's manual motorola add x 0 1 1 1 1 1 d a b oe 0 1 0 0 0 0 1 0 1 0 rc dcbt 0 1 1 1 1 1 0 0 0 0 0 a b 0 1 0 0 0 1 0 1 1 0 0 lhzx 0 1 1 1 1 1 d a b 0 1 0 0 0 1 0 1 1 1 0 eqv x 0 1 1 1 1 1 s a b 0 1 0 0 0 1 1 1 0 0 rc tlbie 1,5 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 b 0 1 0 0 1 1 0 0 1 0 0 eciwx 0 1 1 1 1 1 d a b 0 1 0 0 1 1 0 1 1 0 0 lhzux 0 1 1 1 1 1 d a b 0 1 0 0 1 1 0 1 1 1 0 xor x 0 1 1 1 1 1 s a b 0 1 0 0 1 1 1 1 0 0 rc mfspr 2 0 1 1 1 1 1 d spr 0 1 0 1 0 1 0 0 1 1 0 lwax 4 0 1 1 1 1 1 d a b 0 1 0 1 0 1 0 1 0 1 0 lhax 0 1 1 1 1 1 d a b 0 1 0 1 0 1 0 1 1 1 0 tlbia 1,5 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 0 0 mftb 0 1 1 1 1 1 d tbr 0 1 0 1 1 1 0 0 1 1 0 lwaux 4 0 1 1 1 1 1 d a b 0 1 0 1 1 1 0 1 0 1 0 lhaux 0 1 1 1 1 1 d a b 0 1 0 1 1 1 0 1 1 1 0 sthx 0 1 1 1 1 1 s a b 0 1 1 0 0 1 0 1 1 1 0 orc x 0 1 1 1 1 1 s a b 0 1 1 0 0 1 1 1 0 0 rc sradi x 4 0 1 1 1 1 1 s a sh 1 1 0 0 1 1 1 0 1 1 sh rc slbie 1,4,5 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 b 0 1 1 0 1 1 0 0 1 0 0 ecowx 0 1 1 1 1 1 s a b 0 1 1 0 1 1 0 1 1 0 0 sthux 0 1 1 1 1 1 s a b 0 1 1 0 1 1 0 1 1 1 0 or x 0 1 1 1 1 1 s a b 0 1 1 0 1 1 1 1 0 0 rc divdu x 4 0 1 1 1 1 1 d a b oe 0 1 1 1 0 0 1 0 0 1 rc divwu x 0 1 1 1 1 1 d a b oe 0 1 1 1 0 0 1 0 1 1 rc mtspr 2 0 1 1 1 1 1 s spr 0 1 1 1 0 1 0 0 1 1 0 dcbi 0 1 1 1 1 1 0 0 0 0 0 a b 0 1 1 1 0 1 0 1 1 0 0 nand x 0 1 1 1 1 1 s a b 0 1 1 1 0 1 1 1 0 0 rc divd x 4 0 1 1 1 1 1 d a b oe 0 1 1 1 1 0 1 0 0 1 rc divw x 0 1 1 1 1 1 d a b oe 0 1 1 1 1 0 1 0 1 1 rc slbia 1,4,5 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 mcrxr 0 1 1 1 1 1 crfd 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 lswx 3 0 1 1 1 1 1 d a b 1 0 0 0 0 1 0 1 0 1 0 lwbrx 0 1 1 1 1 1 d a b 1 0 0 0 0 1 0 1 1 0 0 name 0 5678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-13 lfsx 7 0 1 1 1 1 1 d a b 1 0 0 0 0 1 0 1 1 1 0 srw x 0 1 1 1 1 1 s a b 1 0 0 0 0 1 1 0 0 0 rc srd x 4 0 1 1 1 1 1 s a b 1 0 0 0 0 1 1 0 1 1 rc tlbsync 1,5 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 1 0 0 lfsux 7 0 1 1 1 1 1 d a b 1 0 0 0 1 1 0 1 1 1 0 mfsr 0 1 1 1 1 1 d 0 sr 0 0 0 0 0 1 0 0 1 0 1 0 0 1 1 0 lswi 3 0 1 1 1 1 1 d a nb 1 0 0 1 0 1 0 1 0 1 0 sync 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 0 0 lfdx 7 0 1 1 1 1 1 d a b 1 0 0 1 0 1 0 1 1 1 0 lfdux 7 0 1 1 1 1 1 d a b 1 0 0 1 1 1 0 1 1 1 0 mfsrin 1 0 1 1 1 1 1 d 0 0 0 0 0 b 1 0 1 0 0 1 0 0 1 1 0 stswx 3 0 1 1 1 1 1 s a b 1 0 1 0 0 1 0 1 0 1 0 stwbrx 0 1 1 1 1 1 s a b 1 0 1 0 0 1 0 1 1 0 0 stfsx 0 1 1 1 1 1 s a b 1 0 1 0 0 1 0 1 1 1 0 stfsux 0 1 1 1 1 1 s a b 1 0 1 0 1 1 0 1 1 1 0 stswi 3 0 1 1 1 1 1 s a nb 1 0 1 1 0 1 0 1 0 1 0 stfdx 7 0 1 1 1 1 1 s a b 1 0 1 1 0 1 0 1 1 1 0 stfdux 7 0 1 1 1 1 1 s a b 1 0 1 1 1 1 0 1 1 1 0 lhbrx 0 1 1 1 1 1 d a b 1 1 0 0 0 1 0 1 1 0 0 sraw x 0 1 1 1 1 1 s a b 1 1 0 0 0 1 1 0 0 0 rc srad x 4 0 1 1 1 1 1 s a b 1 1 0 0 0 1 1 0 1 0 rc srawi x 0 1 1 1 1 1 s a sh 1 1 0 0 1 1 1 0 0 0 rc eieio 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 0 sthbrx 0 1 1 1 1 1 s a b 1 1 1 0 0 1 0 1 1 0 0 extsh x 0 1 1 1 1 1 s a 0 0 0 0 0 1 1 1 0 0 1 1 0 1 0 rc extsb x 0 1 1 1 1 1 s a 0 0 0 0 0 1 1 1 0 1 1 1 0 1 0 rc tlbld 1,6 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 b 1 1 1 1 0 1 0 0 1 0 0 icbi 0 1 1 1 1 1 0 0 0 0 0 a b 1 1 1 1 0 1 0 1 1 0 0 stfiwx 5 0 1 1 1 1 1 s a b 1 1 1 1 0 1 0 1 1 1 0 extsw 4 0 1 1 1 1 1 s a 0 0 0 0 0 1 1 1 1 0 1 1 0 1 0 rc tlbli 1,6 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 b 1 1 1 1 1 1 0 0 1 0 0 dcbz 0 1 1 1 1 1 0 0 0 0 0 a b 1 1 1 1 1 1 0 1 1 0 0 lwz 1 0 0 0 0 0 d a d name 0 5678910111213141516171819202122232425262728293031
a-14 mpc603e & EC603E risc microprocessors user's manual motorola lwzu 1 0 0 0 0 1 d a d lbz 1 0 0 0 1 0 d a d lbzu 1 0 0 0 1 1 d a d stw 1 0 0 1 0 0 s a d stwu 1 0 0 1 0 1 s a d stb 1 0 0 1 1 0 s a d stbu 1 0 0 1 1 1 s a d lhz 1 0 1 0 0 0 d a d lhzu 1 0 1 0 0 1 d a d lha 1 0 1 0 1 0 d a d lhau 1 0 1 0 1 1 d a d sth 1 0 1 1 0 0 s a d sthu 1 0 1 1 0 1 s a d lmw 3 1 0 1 1 1 0 d a d stmw 3 1 0 1 1 1 1 s a d lfs 7 1 1 0 0 0 0 d a d lfsu 7 1 1 0 0 0 1 d a d lfd 7 1 1 0 0 1 0 d a d lfdu 7 1 1 0 0 1 1 d a d stfs 7 1 1 0 1 0 0 s a d stfsu 7 1 1 0 1 0 1 s a d stfd 7 1 1 0 1 1 0 s a d stfdu 7 1 1 0 1 1 1 s a d ld 4 1 1 1 0 1 0 d a ds 0 0 ldu 4 1 1 1 0 1 0 d a ds 0 1 lwa 4 1 1 1 0 1 0 d a ds 1 0 fdivs x 7 1 1 1 0 1 1 d a b 0 0 0 0 0 1 0 0 1 0 rc fsubs x 7 1 1 1 0 1 1 d a b 0 0 0 0 0 1 0 1 0 0 rc fadds x 7 1 1 1 0 1 1 d a b 0 0 0 0 0 1 0 1 0 1 rc fsqrts x 5, 7 1 1 1 0 1 1 d 0 0 0 0 0 b 0 0 0 0 0 1 0 1 1 0 rc fres x 5, 7 1 1 1 0 1 1 d 0 0 0 0 0 b 0 0 0 0 0 1 1 0 0 0 rc fmuls x 7 1 1 1 0 1 1 d a 0 0 0 0 0 c 1 1 0 0 1 rc fmsubs x 7 1 1 1 0 1 1 d a b c 1 1 1 0 0 rc name 0 5678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-15 fmadds x 7 1 1 1 0 1 1 d a b c 1 1 1 0 1 rc fnmsubs x 7 1 1 1 0 1 1 d a b c 1 1 1 1 0 rc fnmadds x 7 1 1 1 0 1 1 d a b c 1 1 1 1 1 rc std 4 1 1 1 1 1 0 s a ds 0 0 stdu 4 1 1 1 1 1 0 s a ds 0 1 fcmpu 7 1 1 1 1 1 1 crfd 0 0 a b 0 0 0 0 0 0 0 0 0 0 0 frsp x 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 0 0 0 0 1 1 0 0 rc fctiw x 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 0 0 0 0 1 1 1 0 fctiwz x 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 0 0 0 0 1 1 1 1 rc fdiv x 7 1 1 1 1 1 1 d a b 0 0 0 0 0 1 0 0 1 0 rc fsub x 7 1 1 1 1 1 1 d a b 0 0 0 0 0 1 0 1 0 0 rc fadd x 7 1 1 1 1 1 1 d a b 0 0 0 0 0 1 0 1 0 1 rc fsqrt x 5, 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 0 0 0 1 0 1 1 0 rc fsel x 5, 7 1 1 1 1 1 1 d a b c 1 0 1 1 1 rc fmul x 7 1 1 1 1 1 1 d a 0 0 0 0 0 c 1 1 0 0 1 rc frsqrte x 5, 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 0 0 0 1 1 0 1 0 rc fmsub x 7 1 1 1 1 1 1 d a b c 1 1 1 0 0 rc fmadd x 7 1 1 1 1 1 1 d a b c 1 1 1 0 1 rc fnmsub x 7 1 1 1 1 1 1 d a b c 1 1 1 1 0 rc fnmadd x 7 1 1 1 1 1 1 d a b c 1 1 1 1 1 rc fcmpo 7 1 1 1 1 1 1 crfd 0 0 a b 0 0 0 0 1 0 0 0 0 0 0 mtfsb1 x 7 1 1 1 1 1 1 crbd 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 rc fneg x 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 0 0 1 0 1 0 0 0 rc mcrfs 7 1 1 1 1 1 1 crfd 0 0 crfs 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 mtfsb0 x 7 1 1 1 1 1 1 crbd 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 rc fmr x 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 0 1 0 0 1 0 0 0 rc mtfsfi x 7 1 1 1 1 1 1 crfd 0 0 0 0 0 0 0 imm 0 0 0 1 0 0 0 0 1 1 0 rc fnabs x 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 0 1 0 0 0 1 0 0 0 rc fabs x 7 1 1 1 1 1 1 d 0 0 0 0 0 b 0 1 0 0 0 0 1 0 0 0 rc mffs x 7 1 1 1 1 1 1 d 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 rc mtfsf x 7 1 1 1 1 1 1 0 fm 0 b 1 0 1 1 0 0 0 1 1 1 rc fctid x 4,7 1 1 1 1 1 1 d 0 0 0 0 0 b 1 1 0 0 1 0 1 1 1 0 rc name 0 5678910111213141516171819202122232425262728293031
a-16 mpc603e & EC603E risc microprocessors user's manual motorola fctidz x 4,7 1 1 1 1 1 1 d 0 0 0 0 0 b 1 1 0 0 1 0 1 1 1 1 rc fcfid x 4,7 1 1 1 1 1 1 d 0 0 0 0 0 b 1 1 0 1 0 0 1 1 1 0 rc 1 supervisor-level instruction 2 supervisor- and user-level instruction 3 load and store string or multiple instruction 4 64-bit instruction 5 optional in the powerpc architecture 6 603e-implementation specific instruction 7 floating-point instructions are not supported by the EC603E microprocessor and are trapped by the ?ating-point unavailable exception vector. name 0 5678910111213141516171819202122232425262728293031
motorola appendix a. powerpc instruction set listings a-17 a.3 instructions grouped by functional categories table a-3 through table a-30 list the powerpc instructions grouped by function. table a-3. integer arithmetic instructions name 0 5678910111213141516171819202122232425262728293031 add x 31 d a b oe 266 rc addc x 31 d a b oe 10 rc adde x 31 d a b oe 138 rc addi 14 d a simm addic 12 d a simm addic. 13 d a simm addis 15 d a simm addme x 31 d a 0 0 0 0 0 oe 234 rc addze x 31 d a 0 0 0 0 0 oe 202 rc divd x 4 31 d a b oe 489 rc divdu x 4 31 d a b oe 457 rc divw x 31 d a b oe 491 rc divwu x 31 d a b oe 459 rc mulhd x 4 31 d a b 0 73 rc mulhdu x 4 31 d a b 0 9 rc mulhw x 31 d a b 075rc mulhwu x 31 d a b 011rc mulld 4 31 d a b oe 233 rc mulli 07 d a simm mullw x 31 d a b oe 235 rc neg x 31 d a 0 0 0 0 0 oe 104 rc subf x 31 d a b oe 40 rc subfc x 31 d a b oe 8 rc subfic x 08 d a simm subfe x 31 d a b oe 136 rc subfme x 31 d a 0 0 0 0 0 oe 232 rc subfze x 31 d a 0 0 0 0 0 oe 200 rc reserved bits key: instruction not implemented in the 603e
a-18 mpc603e & EC603E risc microprocessors user's manual motorola table a-4. integer compare instructions name 0 5678910111213141516171819202122232425262728293031 cmp 31 crfd 0l a b 0 0 0 0 0 0 0 0 0 0 0 cmpi 11 crfd 0 l a simm cmpl 31 crfd 0l a b 32 0 cmpli 10 crfd 0 l a uimm table a-5. integer logical instructions name 0 5678910111213141516171819202122232425262728293031 and x 31 s a b 28 rc andc x 31 s a b 60 rc andi. 28 s a uimm andis. 29 s a uimm cntlzd x 4 31 s a 0 0 0 0 0 58 rc cntlzw x 31 s a 0 0 0 0 0 26 rc eqv x 31 s a b 284 rc extsb x 31 s a 0 0 0 0 0 954 rc extsh x 31 s a 0 0 0 0 0 922 rc extsw x 4 31 s a 0 0 0 0 0 986 rc nand x 31 s a b 476 rc nor x 31 s a b 124 rc or x 31 s a b 444 rc orc x 31 s a b 412 rc ori 24 s a uimm oris 25 s a uimm xor x 31 s a b 316 rc xori 26 s a uimm xoris 27 s a uimm table a-6. integer rotate instructions name 0 5678910111213141516171819202122232425262728293031 rldcl x 4 30 s a b mb 8 rc rldcr x 4 30 s a b me 9 rc rldic x 4 30 s a sh mb 2 sh rc rldicl x 4 30 s a sh mb 0 sh rc rldicr x 4 30 s a sh me 1 sh rc
motorola appendix a. powerpc instruction set listings a-19 rldimi x 4 30 s a sh mb 3 sh rc rlwimi x 22 s a sh mb me rc rlwinm x 20 s a sh mb me rc rlwnm x 21 s a sh mb me rc table a-7. integer shift instructions name 0 5678910111213141516171819202122232425262728293031 sld x 4 31 s a b 27 rc slw x 31 s a b 24 rc srad x 4 31 s a b 794 rc sradi x 4 31 s a sh 413 sh rc sraw x 31 s a b 792 rc srawi x 31 s a sh 824 rc srd x 4 31 s a b 539 rc srw x 31 s a b 536 rc table a-8. floating-point arithmetic instructions 7 name 0 5678910111213141516171819202122232425262728293031 fadd x 63 d a b 0 0 0 0 0 21 rc fadds x 59 d a b 0 0 0 0 0 21 rc fdiv x 63 d a b 0 0 0 0 0 18 rc fdivs x 59 d a b 0 0 0 0 0 18 rc fmul x 63 d a 0 0 0 0 0 c 25 rc fmuls x 59 d a 0 0 0 0 0 c 25 rc fres x 5 59 d 0 0 0 0 0 b 0 0 0 0 0 24 rc frsqrte x 5 63 d 0 0 0 0 0 b 0 0 0 0 0 26 rc fsub x 63 d a b 0 0 0 0 0 20 rc fsubs x 59 d a b 0 0 0 0 0 20 rc fsel x 5 63 d a b c 23 rc fsqrt x 5 63 d 0 0 0 0 0 b 0 0 0 0 0 22 rc fsqrts x 5 59 d 0 0 0 0 0 b 0 0 0 0 0 22 rc table a-6. integer rotate instructions (continued)
a-20 mpc603e & EC603E risc microprocessors user's manual motorola table a-9. floating-point multiply-add instructions 7 name 0 5678910111213141516171819202122232425262728293031 fmadd x 63 d a b c 29 rc fmadds x 59 d a b c 29 rc fmsub x 63 d a b c 28 rc fmsubs x 59 d a b c 28 rc fnmadd x 63 d a b c 31 rc fnmadds x 59 d a b c 31 rc fnmsub x 63 d a b c 30 rc fnmsubs x 59 d a b c 30 rc table a-10. floating-point rounding and conversion instructions 7 name 0 5678910111213141516171819202122232425262728293031 fcfid x 4 63 d 0 0 0 0 0 b 846 rc fctid x 4 63 d 0 0 0 0 0 b 814 rc fctidz x 4 63 d 0 0 0 0 0 b 815 rc fctiw x 63 d 0 0 0 0 0 b 14 rc fctiwz x 63 d 0 0 0 0 0 b 15 rc frsp x 63 d 0 0 0 0 0 b 12 rc table a-11. floating-point compare instructions 7 name 0 5678910111213141516171819202122232425262728293031 fcmpo 63 crfd 0 0 a b 32 0 fcmpu 63 crfd 0 0 a b 0 0 table a-12. floating-point status and control register instructions 7 name 0 5678910111213141516171819202122232425262728293031 mcrfs 63 crfd 0 0 crfs 0 0 0 0 0 0 0 64 0 mffs x 63 d 0 0 0 0 0 0 0 0 0 0 583 rc mtfsb0 x 63 crbd 0 0 0 0 0 0 0 0 0 0 70 rc mtfsb1 x 63 crbd 0 0 0 0 0 0 0 0 0 0 38 rc mtfsf x 31 0 fm 0 b 711 rc mtfsfi x 63 crfd 0 0 0 0 0 0 0 imm 0 134 rc
motorola appendix a. powerpc instruction set listings a-21 table a-13. integer load instructions name 0 5678910111213141516171819202122232425262728293031 lbz 34 d a d lbzu 35 d a d lbzux 31 d a b 119 0 lbzx 31 d a b 87 0 ld 4 58 d a ds 0 ldu 4 58 d a ds 1 ldux 4 31 d a b 53 0 ldx 4 31 d a b 21 0 lha 42 d a d lhau 43 d a d lhaux 31 d a b 375 0 lhax 31 d a b 343 0 lhz 40 d a d lhzu 41 d a d lhzux 31 d a b 311 0 lhzx 31 d a b 279 0 lwa 4 58 d a ds 2 lwaux 4 31 d a b 373 0 lwax 4 31 d a b 341 0 lwz 32 d a d lwzu 33 d a d lwzux 31 d a b 55 0 lwzx 31 d a b 23 0
a-22 mpc603e & EC603E risc microprocessors user's manual motorola table a-14. integer store instructions name 0 5678910111213141516171819202122232425262728293031 stb 38 s a d stbu 39 s a d stbux 31 s a b 247 0 stbx 31 s a b 215 0 std 4 62 s a ds 0 stdu 4 62 s a ds 1 stdux 4 31 s a b 181 0 stdx 4 31 s a b 149 0 sth 44 s a d sthu 45 s a d sthux 31 s a b 439 0 sthx 31 s a b 407 0 stw 36 s a d stwu 37 s a d stwux 31 s a b 183 0 stwx 31 s a b 151 0 table a-15. integer load and store with byte-reverse instructions name 0 5678910111213141516171819202122232425262728293031 lhbrx 31 d a b 790 0 lwbrx 31 d a b 534 0 sthbrx 31 s a b 918 0 stwbrx 31 s a b 662 0 table a-16. integer load and store multiple instructions name 0 5678910111213141516171819202122232425262728293031 lmw 3 46 d a d stmw 3 47 s a d
motorola appendix a. powerpc instruction set listings a-23 table a-17. integer load and store string instructions name 0 5678910111213141516171819202122232425262728293031 lswi 3 31 d a nb 597 0 lswx 3 31 d a b 533 0 stswi 3 31 s a nb 725 0 stswx 3 31 s a b 661 0 table a-18. memory synchronization instructions name 0 5678910111213141516171819202122232425262728293031 eieio 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 854 0 isync 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 150 0 ldarx 4 31 d a b 84 0 lwarx 31 d a b 20 0 stdcx. 4 31 s a b 214 1 stwcx. 31 s a b 150 1 sync 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 598 0 table a-19. floating-point load instructions 7 name 0 5678910111213141516171819202122232425262728293031 lfd 50 d a d lfdu 51 d a d lfdux 31 d a b 631 0 lfdx 31 d a b 599 0 lfs 48 d a d lfsu 49 d a d lfsux 31 d a b 567 0 lfsx 31 d a b 535 0
a-24 mpc603e & EC603E risc microprocessors user's manual motorola table a-20. floating-point store instructions 7 name 0 5678910111213141516171819202122232425262728293031 stfd 54 s a d stfdu 55 s a d stfdux 31 s a b 759 0 stfdx 31 s a b 727 0 stfiwx 5 31 s a b 983 0 stfs 52 s a d stfsu 53 s a d stfsux 31 s a b 695 0 stfsx 31 s a b 663 0 table a-21. floating-point move instructions 7 name 0 5678910111213141516171819202122232425262728293031 fabs x 63 d 0 0 0 0 0 b 264 rc fmr x 63 d 0 0 0 0 0 b 72 rc fnabs x 63 d 0 0 0 0 0 b 136 rc fneg x 63 d 0 0 0 0 0 b 40 rc table a-22. branch instructions name 0 5678910111213141516171819202122232425262728293031 b x 18 li aa lk bc x 16 bo bi bd aa lk bcctr x 19 bo bi 0 0 0 0 0 528 lk bclr x 19 bo bi 0 0 0 0 0 16 lk table a-23. condition register logical instructions name 0 5678910111213141516171819202122232425262728293031 crand 19 crbd crba crbb 257 0 crandc 19 crbd crba crbb 129 0 creqv 19 crbd crba crbb 289 0 crnand 19 crbd crba crbb 225 0 crnor 19 crbd crba crbb 33 0
motorola appendix a. powerpc instruction set listings a-25 cror 19 crbd crba crbb 449 0 crorc 19 crbd crba crbb 417 0 crxor 19 crbd crba crbb 193 0 mcrf 19 crfd 0 0 crfs 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 table a-24. system linkage instructions name 0 5678910111213141516171819202122232425262728293031 rfi 1 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 0 sc 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 table a-25. trap instructions name 0 5678910111213141516171819202122232425262728293031 td 4 31 to a b 68 0 tdi 4 03 to a simm tw 31 to a b 4 0 twi 03 to a simm table a-26. processor control instructions name 0 5678910111213141516171819202122232425262728293031 mcrxr 31 crfs 0 0 0 0 0 0 0 0 0 0 0 0 512 0 mfcr 31 d 0 0 0 0 0 0 0 0 0 0 19 0 mfmsr 1 31 d 0 0 0 0 0 0 0 0 0 0 83 0 mfspr 2 31 d spr 339 0 mftb 31 d tpr 371 0 mtcrf 31 s 0 crm 0 144 0 mtmsr 1 31 s 0 0 0 0 0 0 0 0 0 0 146 0 mtspr 2 31 d spr 467 0 table a-23. condition register logical instructions (continued)
a-26 mpc603e & EC603E risc microprocessors user's manual motorola table a-27. cache management instructions name 0 5678910111213141516171819202122232425262728293031 dcbf 31 0 0 0 0 0 a b 86 0 dcbi 1 31 0 0 0 0 0 a b 470 0 dcbst 31 0 0 0 0 0 a b 54 0 dcbt 31 0 0 0 0 0 a b 278 0 dcbtst 31 0 0 0 0 0 a b 246 0 dcbz 31 0 0 0 0 0 a b 1014 0 icbi 31 0 0 0 0 0 a b 982 0 table a-28. segment register manipulation instructions name 0 5678910111213141516171819202122232425262728293031 mfsr 1 31 d 0 sr 0 0 0 0 0 595 0 mfsrin 1 31 d 0 0 0 0 0 b 659 0 mtsr 1 31 s 0 sr 0 0 0 0 0 210 0 mtsrin 1 31 s 0 0 0 0 0 b 242 0 table a-29. lookaside buffer management instructions name 0 5678910111213141516171819202122232425262728293031 slbia 1,4,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 498 0 slbie 1,4,5 31 0 0 0 0 0 0 0 0 0 0 b 434 0 tlbia 1,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 370 0 tlbie 1,5 31 0 0 0 0 0 0 0 0 0 0 b 306 0 tlbld 1,6 31 0 0 0 0 0 0 0 0 0 0 b 978 0 tlbli 1,6 31 0 0 0 0 0 0 0 0 0 0 b 1010 0 tlbsync 1,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 566 0
motorola appendix a. powerpc instruction set listings a-27 table a-30. external control instructions name 0 5678910111213141516171819202122232425262728293031 eciwx 31 d a b 310 0 ecowx 31 s a b 438 0 1 supervisor-level instruction 2 supervisor- and user-level instruction 3 load and store string or multiple instruction 4 64-bit instruction 5 optional in the powerpc architecture 6 603e-implementation specific instruction 7 floating-point instructions are not supported by the EC603E microprocessor and are trapped by the ?ating-point unavailable exception vector.
a-28 mpc603e & EC603E risc microprocessors user's manual motorola a.4 instructions sorted by form table a-31 through table a-45 list the powerpc instructions grouped by form. table a-31. i-form table a-32. b-form table a-33. sc-form table a-34. d-form opcd li aa lk speci? instruction name 0 5678910111213141516171819202122232425262728293031 b x 18 li aa lk opcd bo bi bd aa lk speci? instruction name 0 5678910111213141516171819202122232425262728293031 bc x 16 bo bi bd aa lk opcd 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 speci? instruction name 0 5678910111213141516171819202122232425262728293031 sc 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 opcd d a d opcd d a simm opcd s a d opcd s a uimm opcd crfd 0 l a simm opcd crfd 0 l a uimm opcd to a simm reserved bits key: instruction not implemented in the 603e
motorola appendix a. powerpc instruction set listings a-29 speci? instructions name 0 5678910111213141516171819202122232425262728293031 addi 14 d a simm addic 12 d a simm addic. 13 d a simm addis 15 d a simm andi. 28 s a uimm andis. 29 s a uimm cmpi 11 crfd 0 l a simm cmpli 10 crfd 0 l a uimm lbz 34 d a d lbzu 35 d a d lfd 7 50 d a d lfdu 7 51 d a d lfs 7 48 d a d lfsu 7 49 d a d lha 42 d a d lhau 43 d a d lhz 40 d a d lhzu 41 d a d lmw 3 46 d a d lwz 32 d a d lwzu 33 d a d mulli 7 d a simm ori 24 s a uimm oris 25 s a uimm stb 38 s a d stbu 39 s a d stfd 7 54 s a d stfdu 7 55 s a d stfs 7 52 s a d stfsu 7 53 s a d sth 44 s a d sthu 45 s a d
a-30 mpc603e & EC603E risc microprocessors user's manual motorola table a-35. ds-form table a-36. x-form stmw 3 47 s a d stw 36 s a d stwu 37 s a d subfic 08 d a simm tdi 4 02 to a simm twi 03 to a simm xori 26 s a uimm xoris 27 s a uimm opcd d a ds xo opcd s a ds xo speci? instructions name 0 5678910111213141516171819202122232425262728293031 ld 4 58 d a ds 0 ldu 4 58 d a ds 1 lwa 4 58 d a ds 2 std 4 62 s a ds 0 stdu 4 62 s a ds 1 opcd d a b xo 0 opcd d a nb xo 0 opcd d 0 0 0 0 0 b xo 0 opcd d 0 0 0 0 0 0 0 0 0 0 xo 0 opcd d 0 sr 0 0 0 0 0 xo 0 opcd s a b xo rc opcd s a b xo 1 opcd s a b xo 0 opcd s a nb xo 0 opcd s a 0 0 0 0 0 xo rc opcd s 0 0 0 0 0 b xo 0 opcd s 0 0 0 0 0 0 0 0 0 0 xo 0 opcd s 0 sr 0 0 0 0 0 xo 0 opcd s a sh xo rc opcd crfd 0 l a b xo 0
motorola appendix a. powerpc instruction set listings a-31 opcd crfd 0 0 a b xo 0 opcd crfd 0 0 crfs 0 0 0 0 0 0 0 xo 0 opcd crfd 0 0 0 0 0 0 0 0 0 0 0 0 xo 0 opcd crfd 0 0 0 0 0 0 0 imm 0xorc opcd to a b xo 0 opcd d 0 0 0 0 0 b xo rc opcd d 0 0 0 0 0 0 0 0 0 0 xo rc opcd crbd 0 0 0 0 0 0 0 0 0 0 xo rc opcd 0 0 0 0 0 a b xo 0 opcd 0 0 0 0 0 0 0 0 0 0 b xo 0 opcd 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 xo 0 speci? instructions and x 31 s a b 28 rc andc x 31 s a b 60 rc cmp 31 crfd 0l a b 0 0 cmpl 31 crfd 0l a b 32 0 cntlzd x 4 31 s a 0 0 0 0 0 58 rc cntlzw x 31 s a 0 0 0 0 0 26 rc dcbf 31 0 0 0 0 0 a b 86 0 dcbi 1 31 0 0 0 0 0 a b 470 0 dcbst 31 0 0 0 0 0 a b 54 0 dcbt 31 0 0 0 0 0 a b 278 0 dcbtst 31 0 0 0 0 0 a b 246 0 dcbz 31 0 0 0 0 0 a b 1014 0 eciwx 31 d a b 310 0 ecowx 31 s a b 438 0 eieio 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 854 0 eqv x 31 s a b 284 rc extsb x 31 s a 0 0 0 0 0 954 rc extsh x 31 s a 0 0 0 0 0 922 rc extsw x 4 31 s a 0 0 0 0 0 986 rc fabs x 7 63 d 0 0 0 0 0 b 264 rc fcfid x 4,7 63 d 0 0 0 0 0 b 846 rc fcmpo 7 63 crfd 0 0 a b 32 0
a-32 mpc603e & EC603E risc microprocessors user's manual motorola fcmpu 7 63 crfd 0 0 a b 0 0 fctid x 4,7 63 d 0 0 0 0 0 b 814 rc fctidz x 4,7 63 d 0 0 0 0 0 b 815 rc fctiw x 7 63 d 0 0 0 0 0 b 14 rc fctiwz x 7 63 d 0 0 0 0 0 b 15 rc fmr x 7 63 d 0 0 0 0 0 b 72 rc fnabs x 7 63 d 0 0 0 0 0 b 136 rc fneg x 7 63 d 0 0 0 0 0 b 40 rc frsp x 7 63 d 0 0 0 0 0 b 12 rc icbi 31 0 0 0 0 0 a b 982 0 lbzux 31 d a b 119 0 lbzx 31 d a b 87 0 ldarx 4 31 d a b 84 0 ldux 4 31 d a b 53 0 ldx 4 31 d a b 21 0 lfdux 7 31 d a b 631 0 lfdx 7 31 d a b 599 0 lfsux 7 31 d a b 567 0 lfsx 7 31 d a b 535 0 lhaux 31 d a b 375 0 lhax 31 d a b 343 0 lhbrx 31 d a b 790 0 lhzux 31 d a b 311 0 lhzx 31 d a b 279 0 lswi 3 31 d a nb 597 0 lswx 3 31 d a b 533 0 lwarx 31 d a b 20 0 lwaux 4 31 d a b 373 0 lwax 4 31 d a b 341 0 lwbrx 31 d a b 534 0 lwzux 31 d a b 55 0 lwzx 31 d a b 23 0 mcrfs 63 crfd 0 0 crfs 0 0 0 0 0 0 0 64 0 mcrxr 31 crfd 0 0 0 0 0 0 0 0 0 0 0 0 512 0 mfcr 31 d 0 0 0 0 0 0 0 0 0 0 19 0
motorola appendix a. powerpc instruction set listings a-33 mffs x 7 63 d 0 0 0 0 0 0 0 0 0 0 583 rc mfmsr 1 31 d 0 0 0 0 0 0 0 0 0 0 83 0 mfsr 1 31 d 0 sr 0 0 0 0 0 595 0 mfsrin 1 31 d 0 0 0 0 0 b 659 0 mtfsb0 x 7 63 crbd 0 0 0 0 0 0 0 0 0 0 70 rc mtfsb1 x 7 63 crfd 0 0 0 0 0 0 0 0 0 0 38 rc mtfsfi x 7 63 crbd 0 0 0 0 0 0 0 imm 0 134 rc mtmsr 1 31 s 0 0 0 0 0 0 0 0 0 0 146 0 mtsr 1 31 s 0 sr 0 0 0 0 0 210 0 mtsrin 1 31 s 0 0 0 0 0 b 242 0 nand x 31 s a b 476 rc nor x 31 s a b 124 rc or x 31 s a b 444 rc orc x 31 s a b 412 rc slbia 1,4,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 498 0 slbie 1,4,5 31 0 0 0 0 0 0 0 0 0 0 b 434 0 sld x 4 31 s a b 27 rc slw x 31 s a b 24 rc srad x 4 31 s a b 794 rc sraw x 31 s a b 792 rc srawi x 31 s a sh 824 rc srd x 4 31 s a b 539 rc srw x 31 s a b 536 rc stbux 31 s a b 247 0 stbx 31 s a b 215 0 stdcx. 4 31 s a b 214 1 stdux 4 31 s a b 181 0 stdx 4 31 s a b 149 0 stfdux 7 31 s a b 759 0 stfdx 7 31 s a b 727 0 stfiwx 5,7 31 s a b 983 0 stfsux 7 31 s a b 695 0 stfsx 7 31 s a b 663 0 sthbrx 31 s a b 918 0 sthux 31 s a b 439 0
a-34 mpc603e & EC603E risc microprocessors user's manual motorola table a-37. xl-form sthx 31 s a b 407 0 stswi 3 31 s a nb 725 0 stswx 3 31 s a b 661 0 stwbrx 31 s a b 662 0 stwcx. 31 s a b 150 1 stwux 31 s a b 183 0 stwx 31 s a b 151 0 sync 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 598 0 td 4 31 to a b 68 0 tlbia 1,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 370 0 tlbie 1,5 31 0 0 0 0 0 0 0 0 0 0 b 306 0 tlbld 1,6 31 0 0 0 0 0 0 0 0 0 0 b 978 0 tlbli 1,6 31 0 0 0 0 0 0 0 0 0 0 b 1010 0 tlbsync 1,5 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 566 0 tw 31 to a b 4 0 xor x 31 s a b 316 rc opcd bo bi 0 0 0 0 0 xo lk opcd crbd crba crbb xo 0 opcd crfd 0 0 crfs 0 0 0 0 0 0 0 xo 0 opcd 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 xo 0 speci? instructions name 0 5678910111213141516171819202122232425262728293031 bcctr x 19 bo bi 0 0 0 0 0 528 lk bclr x 19 bo bi 0 0 0 0 0 16 lk crand 19 crbd crba crbb 257 0 crandc 19 crbd crba crbb 129 0 creqv 19 crbd crba crbb 289 0 crnand 19 crbd crba crbb 225 0 crnor 19 crbd crba crbb 33 0 cror 19 crbd crba crbb 449 0 crorc 19 crbd crba crbb 417 0 crxor 19 crbd crba crbb 193 0
motorola appendix a. powerpc instruction set listings a-35 table a-38. xfx-form table a-39. xfl-form table a-40. xs-form table a-41. xo-form isync 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 150 0 mcrf 19 crfd 0 0 crfs 0 0 0 0 0 0 0 0 0 rfi 1 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 0 opcd d spr xo 0 opcd d 0 crm 0xo 0 opcd s spr xo 0 opcd d tbr xo 0 speci? instructions name 0 5678910111213141516171819202122232425262728293031 mfspr 2 31 d spr 339 0 mftb 31 d tbr 371 0 mtcrf 31 s 0 crm 0 144 0 mtspr 2 31 d spr 467 0 opcd 0 fm 0 b xo rc speci? instructions name 0 5678910111213141516171819202122232425262728293031 mtfsf x 7 63 0 fm 0 b 711 rc opcd s a sh xo sh rc speci? instructions name 0 5678910111213141516171819202122232425262728293031 sradi x 4 31 s a sh 413 sh rc opcd d a b oe xo rc opcd d a b 0xorc opcd d a 0 0 0 0 0 oe xo rc speci? instructions name 0 5678910111213141516171819202122232425262728293031 add x 31 d a b oe 266 rc addc x 31 d a b oe 10 rc
a-36 mpc603e & EC603E risc microprocessors user's manual motorola table a-42. a-form adde x 31 d a b oe 138 rc addme x 31 d a 0 0 0 0 0 oe 234 rc addze x 31 d a 0 0 0 0 0 oe 202 rc divd x 4 31 d a b oe 489 rc divdu x 4 31 d a b oe 457 rc divw x 31 d a b oe 491 rc divwu x 31 d a b oe 459 rc mulhd x 4 31 d a b 0 73 rc mulhdu x 4 31 d a b 0 9 rc mulhw x 31 d a b 075rc mulhwu x 31 d a b 011rc mulld x 4 31 d a b oe 233 rc mullw x 31 d a b oe 235 rc neg x 31 d a 0 0 0 0 0 oe 104 rc subf x 31 d a b oe 40 rc subfc x 31 d a b oe 8 rc subfe x 31 d a b oe 136 rc subfme x 31 d a 0 0 0 0 0 oe 232 rc subfze x 31 d a 0 0 0 0 0 oe 200 rc opcd d a b 0 0 0 0 0 xo rc opcd d a b c xo rc opcd d a 0 0 0 0 0 c xo rc opcd d 0 0 0 0 0 b 0 0 0 0 0 xo rc speci? instructions name 0 5678910111213141516171819202122232425262728293031 fadd x 7 63 d a b 0 0 0 0 0 21 rc fadds x 7 59 d a b 0 0 0 0 0 21 rc fdiv x 7 63 d a b 0 0 0 0 0 18 rc fdivs x 7 59 d a b 0 0 0 0 0 18 rc fmadd x 7 63 d a b c 29 rc fmadds x 7 59 d a b c 29 rc fmsub x 7 63 d a b c 28 rc fmsubs x 7 59 d a b c 28 rc
motorola appendix a. powerpc instruction set listings a-37 table a-43. m-form table a-44. md-form fmul x 7 63 d a 0 0 0 0 0 c 25 rc fmuls x 7 59 d a 0 0 0 0 0 c 25 rc fnmadd x 7 63 d a b c 31 rc fnmadds x 7 59 d a b c 31 rc fnmsub x 7 63 d a b c 30 rc fnmsubs x 7 59 d a b c 30 rc fres x 5? 59 d 0 0 0 0 0 b 0 0 0 0 0 24 rc frsqrte x 5,7 63 d 0 0 0 0 0 b 0 0 0 0 0 26 rc fsel x 5.7 63 d a b c 23 rc fsqrt x 5,7 63 d 0 0 0 0 0 b 0 0 0 0 0 22 rc fsqrts x 5,7 59 d 0 0 0 0 0 b 0 0 0 0 0 22 rc fsub x 7 63 d a b 0 0 0 0 0 20 rc fsubs x 7 59 d a b 0 0 0 0 0 20 rc opcd s a sh mb me rc opcd s a b mb me rc speci? instructions name 0 5678910111213141516171819202122232425262728293031 rlwimi x 20 s a sh mb me rc rlwinm x 21 s a sh mb me rc rlwnm x 23 s a b mb me rc opcd s a sh mb xo sh rc opcd s a sh me xo sh rc speci? instructions name 0 5678910111213141516171819202122232425262728293031 rldic x 4 30 s a sh mb 2 sh rc rldicl x 4 30 s a sh mb 0 sh rc rldicr x 4 30 s a sh me 1 sh rc rldimi x 4 30 s a sh mb 3 sh rc
a-38 mpc603e & EC603E risc microprocessors user's manual motorola table a-45. mds-form opcd s a b mb xo rc opcd s a b me xo rc speci? instructions name 0 5678910111213141516171819202122232425262728293031 rldcl x 4 30 s a b mb 8 rc rldcr x 4 30 s a b me 9 rc 1 supervisor-level instruction 2 supervisor- and user-level instruction 3 load and store string or multiple instruction 4 64-bit instruction 5 optional in the powerpc architecture 6 603e-implementation specific instruction 7 floating-point instructions are not supported by the EC603E microprocessor and are trapped by the ?ating-point unavailable exception vector.
motorola appendix a. powerpc instruction set listings a-39 a.5 instruction set legend table a-46 provides general information on the powerpc instruction set (such as the architectural level, privilege level, and form). table a-46. powerpc instruction set legend uisa vea oea supervisor level 64-bit optional form add x ? xo addc x ? xo adde x ? xo addi ? d addic ? d addic. ? d addis ? d addme x ? xo addze x ? xo and x ? x andc x ? x andi. ? d andis. ? d b x ? i bc x ? b bcctr x ? xl bclr x ? xl cmp ? x cmpi ? d cmpl ? x cmpli ? d cntlzd x 4 ? ? x cntlzw x ? x crand ? xl crandc ? xl creqv ? xl reserved bits key: instruction not implemented in the 603e
a-40 mpc603e & EC603E risc microprocessors user's manual motorola crnand ? xl crnor ? xl cror ? xl crorc ? xl crxor ? xl dcbf ? x dcbi 1 ?? x dcbst ? x dcbt ? x dcbtst ? x dcbz ? x divd x 4 ? ? xo divdu x 4 ? ? xo divw x ? xo divwu x ? xo eciwx ? ? x ecowx ? ? x eieio ? x eqv x ? x extsb x ? x extsh x ? x extsw x 4 ? ? x fabs x 7 ? x fadd x 7 ? a fadds x 7 ? a fcfid x 4,7 ? ? x fcmpo 7 ? x fcmpu 7 ? x fctid x 4,7 ? ? x fctidz x 7,4 ? ? x fctiw x 7 ? x fctiwz x 7 ? x fdiv x 7 ? a fdivs x 7 ? a fmadd x 7 ? a
motorola appendix a. powerpc instruction set listings a-41 fmadds x 7 ? a fmr x 7 ? x fmsub x 7 ? a fmsubs x 7 ? a fmul x 7 ? a fmuls x 7 ? a fnabs x 7 ? x fneg x 7 ? x fnmadd x 7 ? a fnmadds x 7 ? a fnmsub x 7 ? a fnmsubs x 7 ? a fres x 5,7 ?? a frsp x 7 ? x frsqrte x 5,7 ?? a fsel x 5,7 ?? a fsqrt x 5,7 ? ? a fsqrts x 5,7 ? ? a fsub x 7 ? a fsubs x 7 ? a icbi ? x isync ? xl lbz ? d lbzu ? d lbzux ? x lbzx ? x ld 4 ? ? ds ldarx 4 ? ? x ldu 4 ? ? ds ldux 4 ? ? x ldx 4 ? ? x lfd 7 ? d lfdu 7 ? d lfdux 7 ? x lfdx 7 ? x
a-42 mpc603e & EC603E risc microprocessors user's manual motorola uisa vea oea supervisor level 64-bit optional form lfs 7 ? d lfsu 7 ? d lfsux 7 ? x lfsx 7 ? x lha ? d lhau ? d lhaux ? x lhax ? x lhbrx ? x lhz ? d lhzu ? d lhzux ? x lhzx ? x lmw 3 ? d lswi 3 ? x lswx 3 ? x lwa 4 ? ? ds lwarx ? x lwaux 4 ? ? x lwax 4 ? ? x lwbrx ? x lwz ? d lwzu ? d lwzux ? x lwzx ? x mcrf ? xl mcrfs 7 ? x mcrxr ? x mfcr ? x mffs x 7 ? x mfmsr 1 ?? x mfspr 2 ? ?? xfx mfsr 1 ?? x mfsrin 1 ?? x
motorola appendix a. powerpc instruction set listings a-43 uisa vea oea supervisor level 64-bit optional form mftb ? xfx mtcrf ? xfx mtfsb0 x 7 ? x mtfsb1 x 7 ? x mtfsf x 7 ? xfl mtfsfi x 7 ? x mtmsr 1 ?? x mtspr 2 ? ?? xfx mtsr 1 ?? x mtsrin 1 ?? x mulhd x 4 ? ? xo mulhdu x 4 ? ? xo mulhw x ? xo mulhwu x ? xo mulld x 4 ? ? xo mulli ? d mullw x ? xo nand x ? x neg x ? xo nor x ? x or x ? x orc x ? x ori ? d oris ? d rfi 1 ?? xl rldcl x 4 ? ? mds rldcr x 4 ? ? mds rldic x 4 ? ? md rldicl x 4 ? ? md rldicr x 4 ? ? md rldimi x 4 ? ? md rlwimi x ? m rlwinm x ? m rlwnm x ? m
a-44 mpc603e & EC603E risc microprocessors user's manual motorola uisa vea oea supervisor level 64-bit optional form sc ? ? sc slbia 1,4,5 ? ? ? ? x slbie 1,4,5 ? ? ? ? x sld x 4 ? ? x slw x ? x srad x 4 ? ? x sradi x 4 ? ? xs sraw x ? x srawi x ? x srd x 4 ? ? x srw x ? x stb ? d stbu ? d stbux ? x stbx ? x std 4 ? ? ds stdcx. 4 ? ? x stdu 4 ? ? ds stdux 4 ? ? x stdx 4 ? ? x stfd 7 ? d stfdu 7 ? d stfdux 7 ? x stfdx 7 ? x stfiwx 5,7 ?? x stfs 7 ? d stfsu 7 ? d stfsux 7 ? x stfsx 7 ? x sth ? d sthbrx ? x sthu ? d sthux ? x sthx ? x
motorola appendix a. powerpc instruction set listings a-45 stmw 3 ? d stswi 3 ? x stswx 3 ? x stw ? d stwbrx ? x stwcx. ? x stwu ? d stwux ? x stwx ? x subf x ? xo subfc x ? xo subfe x ? xo subfic ? d subfme x ? xo subfze x ? xo sync ? x td 4 ? ? x tdi 4 ? ? d tlbia 1,5 ? ? ? x tlbie 1,5 ?? ? x tlbld 1,6 ? x tlbli 1,6 ? x tlbsync 1,5 ?? x tw ? x twi ? d xor x ? x xori ? d xoris ? d 1 supervisor-level instruction 2 supervisor- and user-level instruction 3 load and store string or multiple instruction 4 64-bit instruction 5 optional in the powerpc architecture 6 603e-implementation specific instruction 7 floating-point instructions are not supported by the EC603E microprocessor andare trapped by the ?ating-point unavailable exception vector.
a-46 mpc603e & EC603E risc microprocessors user's manual motorola
motorola appendix b. instructions not implemented b-1 appendix b instructions not implemented b0 b0 this appendix provides a list of the 32-bit and 64-bit powerpc instructions that are not implemented in the powerpc 603e microprocessor. it also provides a list of the ?ating- point instructions that are not supported by the EC603E microprocessor and the 64-bit spr encoding that is not implemented by the 603e. note that any attempt to execute instructions that are not implemented on the 603e will generate an illegal instruction exception. note that exceptions are referred to as interrupts in the architecture speci?ation. table b-1 provides the 32-bit powerpc instructions that are optional to the powerpc architecture but not implemented by the 603e. table b-2 provides a list of 64-bit instructions that are not implemented by the 603e and EC603E microprocessors. table b-1. 32-bit instructions not implemented by the powerpc 603e mnemonic instruction fsqrt floating square root (double-precision) fsqrts floating square root single tlbia tlb invalidate all table b-2. 64-bit instructions not implemented mnemonic instruction cntlzd count leading zeros double word divd divide double word divdu divide double word unsigned extsw extend sign word fc? floating convert from integer double word fctid floating convert to integer double word fctidz floating convert to integer double word with round toward zero ld load double word ldarx load double word and reserve indexed
b-2 mpc603e & EC603E risc microprocessors user's manual motorola ldu load double word with update ldux load double word with update indexed ldx load double word indexed lwa load word algebraic lwaux load word algebraic with update indexed lwax load word algebraic indexed mulld multiply low double word mulhd multiply high double word mulhdu multiply high double word unsigned rldcl rotate left double word then clear left rldcr rotate left double word then clear right rldic rotate left double word immediate then clear rldicl rotate left double word immediate then clear left rldicr rotate left double word immediate then clear right rldimi rotate left double word immediate then mask insert slbia slb invalidate all slbie slb invalidate entry sld shift left double word srad shift right algebraic double word sradi shift right algebraic double word immediate srd shift right double word std store double word stdcx. store double word conditional indexed stdu store double word with update stdux store double word indexed with update stdx store double word indexed td trap double word tdi trap double word immediate table b-2. 64-bit instructions not implemented (continued) mnemonic instruction
motorola appendix b. instructions not implemented b-3 table b-3 lists ?ating-point instructions that are not supported by the EC603E microprocessor. the EC603E microprocessor does not support the ?ating-point unit; therefore, ?ating-point instructions are trapped by the ?ating-point unavailable exception vector but they may be emulated by software. table b-3. floating-point instructions not supported by the EC603E microprocessor mnemonic instruction fabs floating absolute fadd floating add fadds floating add single fcmpo floating compare ordered fcmpu floating compare unordered fctiw floating convert to integer word fctiwz floating convert to integer word with round toward zero fdiv floating divide fdivs floating divide single fmadd floating multiply add fmadds floating multipy add single fmr floating move register fmsub floating multiply subtract fmsubs floating multiply subtract single fmul floating multiply fmuls floating multiply single fnabs floating negative absolute fneg floating negative fnmadd floating negative multiply-add (double-precision) fnmadds floating negatve multiply-add single fnmsub floating negative multiply-subtract (double-precision) fnmsubs floating negative multiply -subtract single fres floating reciprocal estimate single frsp floating round to single frsqrte floating reciprocal square root estimate fsel floating select fsqrt floating square root (double-precision) fsqrts floating square root single
b-4 mpc603e & EC603E risc microprocessors user's manual motorola table b-4 provides the 64-bit spr encoding that is not implemented by the 603e and fsub floating subtract fsubs floating subtract single lfd load floating-point double lfdu load floating-point double with update lfdux load floating-point double with update indexed lfdx load floating-point lfs load floating-point single lfsu load floating-point single with update lfsux load floating-point single with update indexed lfsx load floating-point indexed mcrfs move to condition register from fpscr mffs move from fpscr mtfsb0 move to fpscr bit 0 mtfsb1 move to fpscr bit 1 mtfsf move to fpscr fields mtfs move to fpscr field immediate stfd store floating-point double stfdu store floating-point double with update stfdux store floating-point double with update indexed stfdx store floating-point double indexed st?x store floating-point as integer word indexed stfs store floating-point single stfsu store floating-point single with update stfsux store floating-point single with update indexed stfsx store floating-point single indexed tlbia tlb invalidate all table b-3. floating-point instructions not supported by the EC603E microprocessor (continued) mnemonic instruction
motorola appendix b. instructions not implemented b-5 EC603E microprocessor. table b-4. 64-bit spr encoding not implemented spr register name access decimal spr[5?] spr[0?] 280 01000 11000 asr supervisor
b-6 mpc603e & EC603E risc microprocessors user's manual motorola
motorola appendix c. powerpc 603 processor system design c-1 and programming considerations appendix c powerpc 603 processor system design and programming considerations c0 c0 while the powerpc 603 microprocessor shares most of the attributes of the powerpc 603e microprocessor, the system designer or programmer should keep in mind the 603 hardware and software differences, described in the following sections, that can require modi?ations to accommodate the 603 in systems designed for the 603e. c.1 powerpc 603 microprocessor hardware considerations the 603s hardware implementation differs from the 603e in the following ways: ?a ts signal replaces cse1 signal hardware support for access to direct-store segments bus clock multipliers of 1:1, 2:1, 3:1, and 4:1 only 8-kbyte, two-way set associative instruction and data caches hid1 register not implemented in 603 the following sections provide further information on the operation of some of the hardware features speci? to the 603. c.1.1 hardware support for direct-store accesses the 603 provides hardware support for direct-store bus accesses through the provision of the extended address transfer start (x a ts ) signal, and support for direct-store accesses in the bus interface unit. direct-store accesses are invoked when a segment register t bit is set to 1. the operation of the xa ts signal is described in the following section. the xa ts signal is in the same location as the cse1 signal on the 603e.
c-2 mpc603e & EC603E risc microprocessors user's manual motorola c.1.1.1 extended address transfer start (xa ts ) the xa ts signal is both an input and an output signal on the 603. c.1.1.1.1 extended address transfer start (xa ts )?utput following are the state meaning and timing comments for the xa ts output signal. state meaning asserted?ndicates that the 603 has begun a direct-store operation and that the ?st address cycle is valid. when asserted with the appropriate xatc signals it is also an implied data bus request for certain direct-store operation (unless it is an address-only operation). negated?s negated during an entire memory transaction. timing comments assertion?oincides with the assertion of abb . negation?ccurs one bus clock cycle after the assertion of xa ts . high impedance?oincides with the negation of abb . c.1.1.1.2 extended address transfer start (xa ts )?nput following are the state meaning and timing comments for the xa ts input signal. state meaning asserted?ndicates that the 603 must check for a direct-store operation reply. negated?ndicates that there is no need to check for a direct-store operation reply. timing comments assertion?ay occur while abb is asserted. negation?ust occur one bus clock cycle after xa ts is asserted. c.1.2 direct-store protocol operation the 603 de?es separate memory-mapped and i/o address spaces, or segments, distinguished by the corresponding segment register t bit in the address translation logic of the 603. if the t bit is cleared, the memory reference is a normal memory-mapped access and can use the virtual memory management hardware of the 603. if the t bit is set, the memory reference is a direct-store access. the following points should be considered for direct-store accesses: the use of direct-store segment accesses may have a signi?ant impact on the performance of the 603. the provision of direct-store segment access capability by the 603 is to provide compatibility with earlier hardware i/o controllers and may not be provided in future derivatives of the 603 family. direct-store accesses are strongly ordered; for example, these accesses occur on the bus strictly in order with respect to the instruction stream. direct-store accesses provide synchronous error reporting. the 603 has a single bus interface to support both memory accesses and direct-store segment accesses.
motorola appendix c. powerpc 603 processor system design c-3 and programming considerations the direct-store protocol for the 603 allows for the transfer of 1 to 128 bytes of data between the 603 and the bus unit controller (buc) for each single load or store request issued by the program. the block of data is transferred by the 603 as multiple single-beat bus transactions (individual address and data tenure for each transaction) until completion. the program waits for the sequence of bus transactions to be completed so that a ?al completion status (error or no error) can be reported precisely with respect to the program ?w. the completion status is snooped by the 603 from a bus transaction run by the buc. the system recognizes the assertion of the ts signal as the start of a memory-mapped access. the assertion of xats indicates a direct-store access. this allows memory-mapped devices to ignore direct-store transactions. if xats is asserted, the access is to a direct- store space and the following extensions to the memory access protocol apply: a new set of bus operations are de?ed. the transfer type, transfer burst, and transfer size signals are rede?ed for direct-store operations; they convey the opcode for the i/o transaction (see table c-1). there are two beats of address for each direct-store transfer. the ?st beat (packet 0) provides basic address information such as the segment register and the sender tag and several control bits; the second beat (packet 1) provides additional addressing bits from the segment register and the logical address. the tt[0?], tbst , and tsiz[0?] signals are remapped to form an 8-bit extended transfer code (xatc) which speci?s a command and transfer size for the transaction. the xatc ?ld is driven and snooped by the 603 during direct-store transactions. only the data signals such as dh[0?1] and dp[0?] are used. the lower half of the data bus and parity is ignored. the sender that initiated the transaction must wait for a reply from the receiver bus unit controller (buc) before starting a new operation. the 603 does not burst direct-store transactions . all direct-store transactions generated by the 603 are single-beat transactions of 4 bytes or less (single data beat tenure per address tenure). direct-store transactions use separate arbitration for the split address and data buses and de?e address-only and single-beat transactions. the address-retry vehicle is identical, although there is no hardware coherency support for direct-store transactions. the artry signal is useful, however, for pacing 603 transactions, effectively indicating to the 603 that the buc is in a queue-full condition and cannot accept new data. in addition to the extensions noted above, there are fundamental differences between memory-mapped and direct-store operations. for example, only half of the 64-bit data path is available for 603 direct-store transactions. this lowers the pin count for i/o interfaces but generally results in substantially less bandwidth than memory-mapped accesses. additionally, load/store instructions that address direct-store segments cannot complete successfully without an error-free reply from the addressed buc. because normal direct-
c-4 mpc603e & EC603E risc microprocessors user's manual motorola store accesses involve multiple i/o transactions (streaming), they are likely to be very long latency instructions; therefore, direct-store operations usually stall 603 instruction issue. figure c-1 shows a direct-store tenure. note that the i/o device response is an address-only bus transaction. figure c-1. direct-store tenures it should be noted that in the best case, the use of the 603 direct-store protocol degrades performance and requires the addressed controllers to implement 603 bus master capability to generate the reply transactions. c.1.2.1 direct-store transactions the 603 de?es seven direct-store transaction operations, as shown in table c-1. these operations permit communication between the 603 and bucs. a single 603 store or load instruction (that translates to a direct-store access) generates one or more direct-store operations (two or more direct-store operations for loads) from the 603 and one reply operation from the addressed buc. table c-1. direct-store bus operations operation address only direction xatc encoding load start (request) yes 603 t io 0100 0000 load immediate no 603 t io 0101 0000 load last no 603 t io 0111 0000 store immediate no 603 t io 0001 0000 store last no 603 t io 0011 0000 load reply yes io t 603 1100 0000 store reply yes io t 603 1000 0000 arbitration transfer termination address tenure data tenure independent address and data arbitration transfer termination i/o response arbitration transfer termination no data tenure for i/o response (i/o responses are address-only)
motorola appendix c. powerpc 603 processor system design c-5 and programming considerations for the ?st beat of the address bus, the extended address transfer code (xatc) contains the i/o opcode as shown in table c-1; the opcode is formed by concatenating the transfer type, transfer burst, and transfer size signals de?ed as follows: xatc = tt[0:3]||tbst ||tsiz[0?] c.1.2.1.1 store operations there are three operations de?ed for direct-store store operations from the 603 to the buc, de?ed as follows: 1. store immediate operations transfer up to 32 bits of data each from the 603 to the buc. 2. store last operations transfer up to 32 bits of data each from the 603 to the buc. 3. store reply from the buc reveals the success/failure of that direct-store access to the 603. a direct-store store access consists of one or more data transfer operations followed by the i/o store reply operation from the buc. if the data can be transferred in one 32-bit data transaction, it is marked as a store last operation followed by the store reply operation; no store immediate operation is involved in the transfer, as shown in the following sequence: store last (from 603) store reply (from buc) however, if more data is involved in the direct-store access, there will be one or more store immediate operations. the buc can detect when the last data is being transferred by looking for the store last opcode, as shown in the following sequence: store immediate(s) store last store reply c.1.2.1.2 load operations direct-store load accesses are similar to store operations, except that the 603 latches data from the addressed buc rather than supplying the data to the buc. as with memory accesses, the 603 is the master on both load and store operations; the external system must provide the data bus grant to the 603 when the buc is ready to supply the data to the 603. the load request direct-store operation has no analogous store operation; it informs the addressed buc of the total number of bytes of data that the buc must provide to the 603
c-6 mpc603e & EC603E risc microprocessors user's manual motorola on the subsequent load immediate/load last operations. for direct-store load accesses, the simplest, 32-bit (or fewer) data transfer sequence is as follows: load request load last load reply (from buc) however, if more data is involved in the direct-store access, there will be one or more load immediate operations. the buc can detect when the last data is being transferred by looking for the load last opcode, as seen in the following sequence: load request load imm(s) load last load reply note that three of the seven de?ed operations are address-only transactions and do not use the data bus. however, unlike the memory transfer protocol, these transactions are not broadcast from one master to all snooping devices. the direct-store address-only transaction protocol strictly controls communication between the 603 and the buc. c.1.2.2 direct-store transaction protocol details as mentioned previously, there are two address-bus beats corresponding to two packets of information about the address. the two packets contain the sender and receiver tags, the address and extended address bits, and extra control and status bits. the two beats of the address bus (plus attributes) are shown at the top of figure c-2 as two packets. the ?st packet, packet 0, is then expanded to depict the xatc and address bus information in detail.
motorola appendix c. powerpc 603 processor system design c-7 and programming considerations c.1.2.2.1 packet 0 figure c-2 shows the organization of the ?st packet in a direct-store transaction. the xatc contains the i/o opcode, as discussed earlier and as shown in table c-1. the address bus contains the following: key bit || segment register || sender tag figure c-2. direct-store operation?acket 0 this information is organized as follows: bits 0 and 1 of the address bus are reserved?he 603 always drives these bits to zero. key bit?it 2 is the key bit from the segment register (either sr[kp] or sr[ks]). kp indicates user-level access and ks indicates supervisor-level access. the 603 multiplexes the correct key bit into this position according to the current operating context (user or supervisor). (note that user- and supervisor-level refer to problem and privileged state, respectively, in the architecture speci?ation.) segment register?ddress bits 3?7 correspond to bits 3?7 of the selected segment register. note that address bits 3?1 form the 9-bit receiver tag. software must initialize these bits in the segment register to the id of the buc to be addressed; they are referred to as the buid (bus unit id) bits. pid (sender tag)?ddress bits 28?1 form the 4-bit sender tag. the 603 pid (processor id) comes from bits 28?1 of the 603s processor id register. the 4-bit pid tag allows a maximum of 16 processor ids to be de?ed for a given system. if more bits are needed for a very large multiprocessor system, for example, it is envisioned that the second-level cache (or equivalent logic) can append a larger processor tag as needed. the buc addressed by the receiver tag should latch the sender address required by the subsequent i/o reply operation. i/o opcode 0 1 2 3 1112 27 28 31 07 a (0?1) + attributes address bus (a0?31) pkt 0 pkt 1 + xatc reserved key bit from segment register buid pid
c-8 mpc603e & EC603E risc microprocessors user's manual motorola c.1.2.2.2 packet 1 the second address beat, packet 1, transfers byte counts and the physical address for the transaction, as shown in figure c-3. figure c-3. direct-store operation?acket 1 for packet 1, the xatc is de?ed as follows: load request operations?atc contains the total number of bytes to be transferred (128 bytes maximum for 603). immediate/last (load or store) operations?atc contains the current transfer byte count (1 to 4 bytes). address bits 0?1 contain the physical address of the transaction. the physical address is generated by concatenating segment register bits 28?1 with bits 4?1 of the effective address, as follows: segment register (bits 28?1) || effective address (bits 4?1) while the 603 provides the address of the transaction to the buc, the buc must maintain a valid address pointer for the reply. c.1.2.3 i/o reply operations bucs must respond to 603 direct-store transactions with an i/o reply operation, as shown in figure c-4. the purpose of this reply operation is to inform the 603 of the success or failure of the attempted direct-store access. this requires the system direct-store slave to have 603 bus mastership capability? substantially more complex design task than bus slave implementations that use memory-mapped i/o access. reply operations from the buc to the 603 are address-only transactions. as with packet 0 of the address bus on 603 direct-store operations, the xatc contains the opcode for the operation (see table c-1). additionally, the i/o reply operation transfers the sender/receiver tags in the ?st beat. byte count 07 addr + address bus (a0?31) pkt 0 pkt 1 + xatc bus address 034 31 sr(28?1)
motorola appendix c. powerpc 603 processor system design c-9 and programming considerations figure c-4. i/o reply operation the address bits are described in table c-2. the second beat of the address bus is reserved; the xatc and address buses should be driven to zero to preserve compatibility with future protocol enhancements. the following sequence occurs when the 603 detects an error bit set on an i/o reply operation: 1. the 603 completes the instruction that initiated the access. 2. if the instruction is a load, the data is forwarded to the register ?e(s)/sequencer. 3. a direct-store error exception is generated, which transfers 603 control to the direct- store error exception handler to recover from the error. if the error bit is not set, the 603 instruction that initiated the access completes and instruction execution resumes. table c-2. address bits for i/o reply operations address bits description 0? reserved. these bits should be cleared for compatibility with future powerpc microprocessors. 2 error bit. it is set if the buc records an error in the access. 3?1 buid. sender tag of a reply operation. corresponds with bits 3?1 of one of the 603 segment registers. 12?7 address bits 12?7 are buc-speci? and are ignored by the 603. 28?1 pid (receiver tag). the 603 effectively snoops operations on the bus and, on reply operations, compares this ?ld to bits 28?1 of the pid register to determine if it should recognize this i/o reply. i/o opcode 07 address bus (a0?31) + xatc reserved error bit segment register buid pid buc specific 0 1 2 3 1112 27 28 31
c-10 mpc603e & EC603E risc microprocessors user's manual motorola system designers should note the following: ?isplaced?reply operations (that match the processor tag and arrive unexpectedly) are ignored by the 603. external logic must assert aa ck for the 603, even though it is the receiver of the reply operation. aa ck is an input-only signal to the 603. the 603 monitors address parity when enabled by software and xa ts and reply operations (load or store). c.1.2.4 direct-store operation timing the following timing diagrams show the sequence of events in a typical 603 direct-store load access (figure c-5) and a typical 603 direct-store store access (figure c-6). all arbitration signals except for abb and dbb have been omitted for clarity, although they are still required. note that, for either case, the number of immediate operations depends on the amount and the alignment of data to be transferred. if no more than 4 bytes are being transferred, and the data is double-word-aligned (that is, does not straddle an 8-byte address boundary), there will be no immediate operation as shown in the ?ures. the 603 can transfer as many as 128 bytes of data in one load or store instruction (requiring more than 33 immediate operations in the case of misaligned operands). in figure c-5, xats is asserted with the same timing relationship as ts in a memory access. notice, however, that the address bus (and xatc) transition on the next bus clock cycle. the ?st of the two beats on the address bus is valid for one bus clock cycle window only, and that window is de?ed by the assertion of xats . the second address bus beat, however, can be extended by delaying the assertion of aack until the system has latched the address. the load request and load reply operations, shown in figure c-5, are address-only transactions as denoted by the negated tt3 signal during their respective address tenures. note that other types of bus operations can occur between the individual direct-store operations on the bus. the 603 involved in this transaction, however, does not initiate any other direct-store load or store operations once the ?st direct-store operation has begun address tenure; however, if the i/o operation is retried, other higher-priority operations can occur. notice that, in this example (zero wait states), 13 bus clock cycles are required to transfer no more than 8 bytes of data.
motorola appendix c. powerpc 603 processor system design c-11 and programming considerations figure c-5. direct-store interface load access example figure c-6 shows a direct-store store access comprised of three direct-store operations. as with the example in figure c-5, notice that data is transferred only on the 32 bits of the dh bus. as opposed to figure c-5, there is no request operation since the 603 has the data ready for the buc. the assertion of the tea signal during a direct-store operation indicates that an unrecoverable error has occurred. if the tea signal is asserted during a direct-store operation, the tea action will be delayed and the following direct-store transactions will continue until all data transfers from the direct store segment had been completed. the bus agent that asserts tea is responsible for asserting the tea signal for every direct-store transaction tenure including the last one. the direct-store reply, in this case, is not required and will be ignored by the processor. the processor will take a machine check exception after the last direct-store data tenure has been terminated by the assertion of tea , and not before. a b b x a t s a d d r + x a t c d b b dh[0?1] ta 12345678910111213 pkt 0 pkt 1 pkt 0 pkt 1 pkt 0 pkt 1 reply rsrvd request op imm. op last op reply op
c-12 mpc603e & EC603E risc microprocessors user's manual motorola figure c-6. direct-store interface store access example c.1.3 cse signal the 603 employs two-way set associativity for both the instruction and data caches, in place of the four-way set associativity of the 603e. the cse signal indicates which cache set is being loaded during a cache line ?l. table c-3 shows the cse signal encoding indicating the cache set selected during a cache load operation. c.1.4 powerpc 603 processor bus clock multiplier con?uration the 603 provides support for bus clock multipliers of 1:1, 2:1, 3:1, and 4:1. the bus clock multipliers are selected through the setting of the pll_cfg[0?] signals as shown in table c-4. table c-3. cse signal encoding cse cache set element 0 set 0 1 set 1 a b b x a t s a d d r + x a t c d b b dh[0?1] ta 12345678910 pkt 0 pkt 1 pkt 0 pkt 1 reply rsrvd imm. op last op reply op
motorola appendix c. powerpc 603 processor system design c-13 and programming considerations c.1.5 powerpc 603 processor cache organization the 603 provides two 8-kbyte, two-way set associative caches to allow the registers and execution units rapid access to instructions and data. the instruction and data caches are con?ured as 128 sets of two blocks. the operation of the 603s instruction and data caches is consistent with the caches in the 603e, with the exception of the reduced cache size and set associativity. table c-4. powerpc 603 microprocessor pll configuration bus, cpu, and pll frequencies pll_cfg 0? cpu/ sysclk ratio bus 16.6 mhz bus 20 mhz bus 25 mhz bus 33.3 mhz bus 40 mhz bus 50 mhz bus 66.6 mhz 00 00 1:1 66.6 (133) 0001 1:1 33.3 (133) 40 (160) 50 (200) 0010 1:1 16.6 (133) 20 (160) 25 (200) 0100 2:1 66.6 (133) 80 (160) 100 (200) 0101 2:1 33.3 (133) 40 (160) 50 (200) 1000 3:1 75 (150) 100 (200) 1001 3:1 50 (200) 1100 4:1 66.6 (133) 80 (160) 100 (200) 0011 pll bypass 1111 clock off notes : 1. some pll con?urations may select bus, cpu, or pll frequencies which are not useful, not supported, or not tested for by the 603. pll frequencies (shown in parenthesis in ) should not fall below 133 mhz, and should not exceed 200 mhz. 2. in pll bypass mode, the sysclk input signal clocks the internal processor directly, the pll is disabled, and the bus mode is set for 1:1 mode operation. this mode is intended for factory use only. note that the ac timing speci?ations given in this document do not apply in pll bypass mode. 3. in clock-off mode, no clocking occurs inside the 603 regardless of the sysclk input.4. pll_cfg0?ll_cfg1 signals select the cpu-to-bus ratio (1:1, 2:1, 3:1, 4:1), pll_cfg2?ll_cfg3 signals select the cpu-to-pll multiplier (x2, x4, x8).
c-14 mpc603e & EC603E risc microprocessors user's manual motorola c.1.5.1 instruction cache organization the organization of the instruction cache is shown in figure c-7. each cache block contains eight contiguous words from memory that are loaded from an eight-word boundary (that is, bits a27?31 of the logical (effective) addresses are zero); as a result, cache blocks are aligned with page boundaries. note that address bits a20?26 provide an index to select a set. bits a27?31 select a byte within a block. the tags consists of bits pa0?a19. address translation occurs in parallel, such that higher-order bits (the tag bits in the cache) are physical. note that the replacement algorithm is strictly an lru algorithm; that is, the least recently used block is ?led with new instructions on a cache miss. figure c-7. instruction cache organization c.1.5.2 data cache organization the organization of the data cache is shown in figure c-8. each cache block contains eight contiguous words from memory that are loaded from an eight-word boundary (that is, bits a27?31 of the logical (effective) addresses are zero); as a result, cache blocks are aligned with page boundaries. block 127 block 0 address tag address tag 8 words set 0 set 1
motorola appendix c. powerpc 603 processor system design c-15 and programming considerations note that address bits a20?26 provide an index to select a set. bits a27?31 select a byte within a block. the tags consists of bits pa0?a19. address translation occurs in parallel, such that higher-order bits (the tag bits in the cache) are physical. note that the replacement algorithm is strictly an lru algorithm; that is, the least recently used block is ?led with new data on a cache miss. figure c-8. data cache organization c.1.6 pll con?uration (pll_cfg[0?])?nput the 603 operates as described in section 7.2.12.3, ?ll con?uration (pll_cfg[0 3])?nput,?except for the following: to avoid incorrect operation of the pll, the clock input to the sysclk signal input should be stable and within the frequency range speci?d for the selected pll_cfg con?uration during power-up, during normal operation, or when exiting the sleep power-saving mode. c.1.7 address pipelining and split-bus transactions the 603 operates as described in section 8.2.2, address pipelining and split-bus transactions,?except for the following: note that in multiprocessor systems, addresses associated with cache line loads are not snooped between the third and fourth beat during the data tenure when the system is con?ured for 64-bit bus operation. set 127 set 0 address tag address tag 8 words block 0 block 1
c-16 mpc603e & EC603E risc microprocessors user's manual motorola when con?ured for 32-bit bus operation, cache line loads are not snooped between the sixth and eighth beats. to ensure memory coherency, multiprocessor systems should avoid pipelined operation, or disallow snooping during the last data beat of a cache load operation. c.1.8 data bus arbitration the 603 operates as described in section 8.4.1, ?ata bus arbitration,?except for the following: when the 603 is con?ured for 1:1 processor to bus clock operation and dbg is always held asserted, multiple single-beat writes will cause incorrect data to be written to memory. the dbg signal should only be asserted when the data tenure can be started on the following bus cycle. c.2 powerpc 603 processor software considerations when developing software for the 603, the programmer should note the following differences from the 603e: the 603 supports direct-store accesses; setting t = 1 in a segment register does not result in a dsi exception. store instructions have two-cycle latency and two-cycle throughput. the 603 does not perform integer add or compare instructions in the sru. the 603 does not implement the key bit (bit 12) in srr1 to provide information about memory protection violations prior to page table search operations. hid1 is not implemented by the 603; no read-only access to the pll_cfg signal configuration is provided. the pvr value for the 603 is 0x0003. the following sections provide further information on the 603 attributes that may affect software written for the 603e. c.2.1 direct-store interface address translation with address translation enabled, all memory accesses generated by the 603 map to a segment descriptor in the segment table. if t = 1 for the selected segment descriptor and there are no bat hits, the access maps to the direct-store interface, invoking a speci? bus protocol for accessing some special-purpose i/o devices. direct-store segments are provided for power compatibility. as the direct-store interface is present only for compatibility with existing i/o devices that used this interface and the direct-store interface protocol is not optimized for performance, its use is discouraged. the selection of address translation type differs for instruction and data accesses only in that instruction accesses are not allowed from direct-store segments; attempting to fetch an instruction from a direct-
motorola appendix c. powerpc 603 processor system design c-17 and programming considerations store segment causes an isi exception.applications that require low latency load/store access to external address space should use memory-mapped i/o, rather than the direct- store interface. refer to chapter 5, ?emory management?for additional information about address translation and memory accesses. c.2.1.1 direct-store segment translation summary flow figure c-9 shows the ?w used by the mmu when direct-store segment address translation is selected. in the case of a ?ating-point load or store operation to a direct-store segment, other implementations may not take an alignment exception, as is allowed by the powerpc architecture. in the case of an eciwx , ecowx , lwarx , or stwcx. instruction, the 603 sets the dsisr register as shown and causes the dsi exception. figure c-9. direct-store segment translation flow perform direct-store interface access data access instruction access direct-store segment translation t = 1 otherwise floating-point load or store alignment exception otherwise cache instruction ( dcbt , dcbtst , dcbf , dcbi , dcbst , dcbz , or icbi ) no-op otherwise dsi exception isi exception dsisr[5] ? 1 eciwx, ecowx, lwarx , or stwcx. instruction srr1[3] ? 1 optional to the powerpc architecture. implemented in the 603.
c-18 mpc603e & EC603E risc microprocessors user's manual motorola a direct-store access occurs when a data access is initiated and sr[t] is set. in the 603, msr[dr] is a don't care for this case. the following apply for direct-store accesses: floating-point loads and stores to direct-store segments always cause an alignment exception, regardless of operand alignment. lwarx or stwcx. instructions that map into a direct-store segment always cause a dsi exception. however, if the instruction crosses a segment boundary, an alignment exception is taken instead. c.2.1.2 direct-store interface accesses when the address translation process determines that the segment descriptor has t = 1, direct-store interface address translation is selected and no reference is made to the page tables and referenced and changed bits are not updated. these accesses are performed as if the wimg bits were 0b0101; that is, caching is inhibited, the accesses bypass the cache, hardware-enforced coherency is not required, and the accesses are considered guarded. the speci? protocol invoked to perform these accesses involves the transfer of address and data information in packets; however, the powerpc oea does not de?e the exact hardware protocol used for direct-store interface accesses. some instructions cause multiple address/data transactions to occur on the bus. in this case, the address for each transaction is handled individually with respect to the dmmu. the following data is sent by the 603 to the memory controller in the protocol (two packets consisting of address-only cycles). packet 0 one of the kx bits (ks or kp) is selected to be the key as follows: for supervisor accesses (msr[pr] = 0), the ks bit is used and kp is ignored. ? for user accesses (msr[pr] = 1), the kp bit is used and ks is ignored. the contents of bits 3?1 of the segment register, which is the buid ?ld concatenated with the ?ontroller-speci???ld. packet 1?r[28?1] concatenated with the 28 lower-order bits of the effective address, ea4?a31. c.2.1.3 direct-store segment protection page-level memory protection as described in section 5.4.2, ?age memory protection,?is not provided for direct-store segments. the appropriate key bit (ks or kp) from the segment descriptor is sent to the memory controller, and the memory controller implements any protection required. frequently, no such mechanism is provided; the fact that a direct-store segment is mapped into the address space of a process may be regarded as suf?ient authority to access the segment.
motorola appendix c. powerpc 603 processor system design c-19 and programming considerations c.2.1.4 instructions not supported in direct-store segments the following instructions are not supported at all and cause a dsi exception in the 603 (with dsisr[5] set) when issued with an effective address that selects a segment descriptor that has t = 1 (or when msr[dr] = 0): lwarx stwcx. eciwx ecowx c.2.1.5 instructions with no effect in direct-store segments the following instructions are executed as no-ops by the 603 when issued with an effective address that selects a segment where t = 1: dcbt dcbtst dcbf dcbi dcbst dcbz icbi c.2.2 store instruction latency the store instructions executed by the 603 execute with 2-cycle latency, and 2-cycle throughput, in contrast to the 2-cycle latency and 1-cycle throughput of the 603e. table c-5 provides the latencies for the store instructions executed by the 603. table c-5. store instruction timing primary extended mnemonic unit cycles 31 151 stwx lsu 2:2 31 183 stwux lsu 2:2 31 215 stbx lsu 2:2 31 247 stbux lsu 2:2 31 407 sthx lsu 2:2 31 438 ecowx lsu 2:2 31 439 sthux lsu 2:2 31 662 stwbrx lsu 2:2 31 663 stfsx lsu 2:2 31 695 stfsux lsu 2:2 31 727 stfdx lsu 2:2
c-20 mpc603e & EC603E risc microprocessors user's manual motorola c.2.3 instruction execution by system register unit unlike the 603e, the 603s sru does not execute integer add and compare instructions. table c-6 lists the instructions executed by the 603s sru, and the number of cycles required for execution. 31 918 sthbrx lsu 2:2 31 983 st?x lsu 2:2 36 --- stw lsu 2:2 37 --- stwu lsu 2:2 38 --- stb lsu 2:2 39 --- stbu lsu 2:2 44 --- sth lsu 2:2 45 --- sthu lsu 2:2 52 --- stfs lsu 2:2 53 --- stfsu lsu 2:2 54 --- stfd lsu 2:2 55 --- stfdu lsu 2:2 table c-6. system register instructions primary extended mnemonic unit cycles 17 - -1 sc sru 3 19 050 r sru 3 19 150 isync sru 1& 31 083 mfmsr sru 1 31 146 mtmsr sru 2 31 210 mtsr sru 2 31 242 mtsrin sru 2 31 339 mfspr (not i/dbats) sru 1 31 339 mfspr (dbats) sru 3& 31 339 mfspr (ibats) sru 3& 31 467 mtspr (not ibats) sru 2 (xer-&) 31 467 mtspr (ibats) sru 2& 31 595 mfsr sru 3& 31 598 sync sru 1& table c-5. store instruction timing (continued) primary extended mnemonic unit cycles
motorola appendix c. powerpc 603 processor system design c-21 and programming considerations c.2.4 machine check exception (0x00200) the 603 operates as described in section 4.5.2, ?achine check exception (0x00200), with the exception of the following: to ensure memory coherency following the assertion of tea , the instruction cache should be invalidated by setting and clearing hid0[icfi], and ?shing the data cache before performing any load or store operations, or executing any data cache management instructions other than dcbf . note that an assertion of tea during an instruction fetch will result in an immediate instruction refetch before the machine check exception is taken, which will result in a second assertion of the tea signal. the second assertion of tea while the machine check exception is pending from the previous tea assertion will result in the 603 entering the checkstop state instead of taking the machine check exception. c.2.5 instruction address breakpoint exception (0x01400) the 603 operates as described in section 4.5.15, ?nstruction address breakpoint exception (0x01300),?with the exception of the following: to avoid spurious iabr exceptions, the iabr special-purpose register should not be loaded with an address that falls within the same cache line as a disabled, but matching iabr address. c.2.6 cache control instructions the 603 operates as described in section 3.7, ?ache control instructions,?with the exception of the following: note that loop structures that contain long sequences of dcbz or dcbi instructions may cause snoop performance degradation. programmers can improve snoop performance by inserting no-op instructions ( ori 0,0,0) between dcbz or dcbi instructions, replacing the dcbz or dcbi instructions with a sequence of write-through store operations, using the decrementer to generate a periodic exception to allow snoop activity, or mapping the address space where the dcbz or dcbi instructions execute as global (m = 1). 31 659 mfsrin sru 3& 31 854 eieio sru 1 31 371 mftb sru 1 31 467 mttb sru 1 note : cycle times marked with ??require a variable number of cycles due to serialization. table c-6. system register instructions (continued) primary extended mnemonic unit cycles
c-22 mpc603e & EC603E risc microprocessors user's manual motorola note that the use of the dcbz instruction in a multiprocessor system can result in loss of data coherency if the dcbz instruction is executed in memory space marked as global (m = 1). programmers should use software coherency protocols to ensure that no processor can perform a kill operation to memory used by another processor.
motorola glossary of terms and abbreviations glossary-1 glossary of terms and abbreviations the glossary contains an alphabetical list of terms, phrases, and abbreviations used in this book. some of the terms and de?itions included in the glossary are reprinted from ieee std 754-1985, ieee standard for binary floating-point arithmetic , copyright ?985 by the institute of electrical and electronics engineers, inc. with the permission of the ieee. atomic . a bus access that attempts to be part of a read-write operation to the same address uninterrupted by any other access to that address (the term refers to the fact that the transactions are indivisible). the powerpc 603e microprocessor initiates the read and write separately, but signals the memory system that it is attempting an atomic operation. if the operation fails, status is kept so that the 603e can try again. the 603e implements atomic accesses through the lwarx / stwcx. instruction pair. beat . a single state on the 603e bus interface that may extend across multiple bus cycles. a 603e transaction can be composed of multiple address or data beats . biased exponent . the sum of the exponent and a constant (bias) chosen to make the biased exponent's range non-negative. big-endian . a byte-ordering method in memory where the address n of a word corresponds to the most signi?ant byte. in an addressed memory word, the bytes are ordered (left to right) 0, 1, 2, 3, with 0 being the most signi?ant byte. boundedly unde?ed . the results of attempting to execute a given instruction are said to be boundedly unde?ed if they could have been achieved by executing an arbitrary sequence of de?ed instructions, in valid form, starting in the state the machine was in before attempting to execute the given instruction. boundedly unde?ed results for a given instruction may vary between implementations, and between execution attempts in the same implementation. a b
glossary-2 mpc603e & EC603E risc microprocessors user's manual motorola branch folding . a technique of removing the branch instruction from the instruction sequence. burst . a multiple beat data transfer whose total size is typically equal to a cache block (in the 603e, a 32-byte block). bus clock . clock that causes the bus state transitions. bus master . the owner of the address or data bus; the device that initiates or requests the transaction. cache . high-speed memory containing recently accessed data and/or instructions (subset of main memory). cache block . the cacheable unit for a powerpc processor. the size of a cache block may vary among processors. for the 603e, it is one cache line (8 words). cache coherency . caches are coherent if a processor performing a read from its cache is supplied with data corresponding to the most recent value written to memory or to another processors cache. cast-outs . cache block that must be written to memory when a snoop miss causes the least recently used block with modi?d data to be replaced. context synchronization . context synchronization is the result of speci? instructions (such as sc or r ) or when certain events occur (such as an exception). during context synchronization, all instructions in execution complete past the point where they can produce an exception; all instructions in execution complete in the context in which they began execution; all subsequent instructions are fetched and executed in the new context. copy-back operation . a cache operation in which a cache line is copied back to memory to enforce cache coherency. copy-back operations consist of snoop push-out operations and cache cast-out operations. denormalized number . a nonzero ?ating-point number whose exponent has a reserved value, usually the format's minimum, and whose explicit or implicit leading signi?and bit is zero. c d
motorola glossary of terms and abbreviations glossary-3 direct-store segment access . an access to an i/o address space. the 603 de?es separate memory-mapped and i/o address spaces, or segments, distinguished by the corresponding segment register t bit in the address translation logic of the 603. if the t bit is cleared, the memory reference is a normal memory-mapped access and can use the virtual memory management hardware of the 603. if the t bit is set, the memory reference is a direct-store access. e xception . an unusual or error condition encountered by the processor that results in special processing. exception handler . a software routine that executes when an exception occurs. normally, the exception handler corrects the condition that caused the exception, or performs some other meaningful task (such as aborting the program that caused the exception). the addresses of the exception handlers are de?ed by a two-word exception vector that is branched to automatically when an exception occurs. exclusive state. emi state (e) in which only one caching device contains data that is also in system memory. execution synchronization . all instructions in execution are architecturally complete before beginning execution (appearing to begin execution) of the next instruction. similar to context synchronization but doesn't force the contents of the instruction buffers to be deleted and refetched. exponent . the component of a binary ?ating-point number that normally signi?s the integer power to which two is raised in determining the value of the represented number. occasionally the exponent is called the signed or unbiased exponent. feed-forwarding . a 603e feature that reduces the number of clock cycles that an execution unit must wait to use a register. when the source register of the current instruction is the same as the destination register of the previous instruction, the result of the previous instruction is routed to the current instruction at the same time that it is written to the register ?e. with feed-forwarding, the destination bus is gated to the waiting execution unit over the appropriate source bus, saving the cycles which would be used for the write and read. floating-point unit . the functional unit in the 603e processor responsible for executing all ?ating-point instructions. (not supported on the EC603E microprocessor) e f
glossary-4 mpc603e & EC603E risc microprocessors user's manual motorola flush . an operation that causes a modi?d cache block to be invalidated and the data to be written to memory. fraction . the ?ld of the signi?and that lies to the right of its implied binary point. general-purpose register . any of the 32 registers in the 603e register ?e. these registers provide the source operands and destination results for all 603e data manipulation instructions. load instructions move data from memory to registers, and store instructions move data from registers to memory. ieee 754 . a standard written by the institute of electrical and electronics engineers that de?es operations of binary ?ating-point arithmetic and representations of binary ?ating-point numbers. instruction queue . a holding place for instructions fetched from the current instruction stream. integer unit . the functional unit in the 603e responsible for executing all integer instructions. interrupt . an external signal that causes the 603e to suspend current execution and take a prede?ed exception. invalid state . emi state (i) that indicates that the cache block does not contain valid data. kill . an operation that causes a cache block to be invalidated. latency . the number of clock cycles necessary to execute an instruction and make ready the results of that instruction. little-endian . a byte-ordering method in memory where the address n of a word corresponds to the least signi?ant byte. in an addressed memory word, the bytes are ordered (left to right) 3, 2, 1, 0, with 3 being the most signi?ant byte. livelock . a state in which processors interact in a way such that no processor makes progress. mantissa . the decimal part of logarithm. g h i k l m
motorola glossary of terms and abbreviations glossary-5 memory-mapped accesses . accesses whose addresses use the segmented or block address translation mechanisms provided by the mmu and that occur externally with the bus protocol de?ed for memory. memory coherency . refers to memory agreement between caches and system memory (for example, emi cache coherency). memory consistency . refers to levels of memory with respect to a single processor and system memory (for example, on-chip cache, secondary cache, and system memory). memory-forced i/o controller interface access . these accesses are made to memory space. they do not use the extensions to the memory protocol described for i/o controller interface accesses, and they bypass the page- and block-translation and protection mechanisms. memory management unit . the functional unit in the 603e that translates the logical address bits to physical address bits. modi?d state . emi state (m) in which one, and only one, caching device has the valid data for that address. the data at this address in external memory is not valid. nan . an abbreviation for not a number; a symbolic entity encoded in floating-point format. there are two types of nans?ignaling nans and quiet nans. no-op . no-operation. a single-cycle operation that does not affect registers or generate bus activity. out-of-order . an operation is said to be out-of-order when it is not guaranteed to be required by the sequential execution model, such as the execution of an instruction that follows another instruction that may alter the instruction ?w. for example, execution of instructions in an unresolved branch is said to be out-of-order, as is the execution of an instruction behind another instruction that may yet cause an exception. the results of operations that are performed out-of-order are not committed to architected resources until it can be ensured that these results adhere to the in-order, or sequential execution model. over?w . an error condition that occurs during arithmetic operations when the result cannot be stored accurately in the destination register(s). for example, if two 32-bit numbers are added, the sum may require 33 bits due to carry. since the 32-bit registers of the 603e cannot represent this sum, an over?w condition occurs. n o
glossary-6 mpc603e & EC603E risc microprocessors user's manual motorola packet . a term used in the 603 with respect to direct store operations. page . a 4-kbyte area of memory, aligned on a 4-kbyte boundary. park . the act of allowing a bus master to maintain mastership of the bus without having to arbitrate. pipelining . a technique that breaks instruction execution into distinct steps so that multiple steps can be performed at the same time. precise exceptions . the pipeline can be stopped so the instructions that preceded the faulting instruction can complete, and subsequent instructions can be executed following the execution of the exception handler. the system is precise unless one of the imprecise modes for invoking the ?ating-point enabled exception is in effect. quiesce . to come to rest. the processor is said to quiesce when an exception is taken or a sync instruction is executed. the instruction stream is stopped at the decode stage and executing instructions are allowed to complete to create a controlled context for instructions that may be affected by out-of-order, parallel execution. see context synchronization . quiet nans . propagate through almost every arithmetic operation without signaling exceptions. these are used to represent the results of certain invalid operations, such as invalid arithmetic operations on in?ities or on nans, when invalid. scan interface . the 603es test interface. shadowing . shadowing allows a register to be updated by instructions that are executed out of order without destroying machine state information. signaling nans . signal the invalid operation exception when they are speci?d as arithmetic operands signi?and . the component of a binary ?ating-point number that consists of an explicit or implicit leading bit to the left of its implied binary point and a fraction ?ld to the right. slave . the device addressed by a master device. the slave is identi?d in the address tenure and is responsible for supplying or latching the requested data for the master during the data tenure. p q s
motorola glossary of terms and abbreviations glossary-7 snooping . monitoring addresses driven by a bus master to detect the need for coherency actions. snoop push . write-backs due to a snoop hit. the block will transition to an invalid or exclusive state. split - transaction . a transaction with independent request and response tenures. split-transaction bus . a bus that allows address and data transactions from different processors to occur independently. static branch prediction . mechanism by which software (for example, compilers) can give a hint to the machine hardware about the direction the branch is likely to take. superscalar machine . a machine that can issue multiple instructions concurrently from a conventional linear instruction stream. supervisor mode . the privileged operation state of the 603e. in supervisor mode, software can access all control registers and can access the supervisor memory space, among other privileged operations. tenure . the period of bus mastership. for the 603e, there can be separate address bus tenures and data bus tenures. a tenure consists of three phases: arbitration, transfer, termination transaction . a complete exchange between two bus devices. a transaction is minimally comprised of an address tenure; one or more data tenures may be involved in the exchange. there are two kinds of transactions: address/data and address-only. transfer termination . signal that refers to both signals that acknowledge the transfer of individual beats (of both single-beat transfer and individual beats of a burst transfer) and to signals that mark the end of the tenure. under?w . an error condition that occurs during arithmetic operations when the result cannot be represented accurately in the destination register. for example, under?w can happen if two ?ating-point fractions are multiplied and the result is a single-precision number. the result may require a larger exponent and/or mantissa than the single- precision format makes available. in other words, the result is too small to be represented accurately. t u
glossary-8 mpc603e & EC603E risc microprocessors user's manual motorola user mode . the unprivileged operating state of the 603e. in user mode, software can only access certain control registers and can only access user memory space. no privileged operations can be performed. write-through . a memory update policy in which all processor write cycles are written to both the cache and memory. v w
motorola index index-1 index numerics 603e features hardware, 1-7 list of features, 1-2 pid7v-specific, 1-4 instructions, 2-46 overview, 1-1, 1-16 pid7v features, 1-4 hid0 bits (pid7v-specific), 2-8, 3-22 processor identification (pid) number definition, xxvii, 1-1 603-specific features, 1-7, c-1 a aack signal, 7-16 abb signal, 7-5, 8-8 abe (address broadcast enable) bit, 2-8, 3-22 address bus address tenure, 8-7, c-4 address transfer a n , 7-7 ape , 7-8, 8-13 ap n , 7-8 address transfer attribute ci , 7-14 cse n , 7-15 gbl , 7-15 tbst , 7-13, 8-13 tc n , 7-14 tcn, 8-20 tsiz n , 7-12 tsizn, 8-13 tt n , 7-9, 8-13 wt , 7-14 address transfer start ts , 7-6, 8-12 xats (603-specific), 1-7, c-2 xats (603-specific), c-3 address transfer termination aack , 7-16 artry , 3-21, 7-16 terminating address transfer, 8-20 arbitration signals, 7-4, 8-8 bus arbitration abb , 7-5, 8-8 bg , 7-5, 8-8 br , 7-4, 8-8 bus parking, 8-11 address calculation branch instructions, 2-36 effective address, 2-19 floating-point load and store, 2-34 integer load and store, 2-29 address translation, see memory management unit addressing conventions addressing modes, 2-18 alignment, 2-13 aligned data transfer, 2-13, 8-15, 8-19 alignment data transfers, 2-13, 8-15 exception, 4-26, 5-16 rules, 2-13 a n signals, 7-7 ape signal, 7-8, 8-13 ap n signals, 7-8 arbitration, system bus, 8-9, 8-22 artry signal, 3-21, 7-16 atomic memory references stwcx., 2-38 using lwarx/stwcx., 3-19 b bg signal, 7-5, 8-8 block address translation bat register initialization, 5-20 block address translation flow, 5-11 selection of block address translation, 5-8 boundedly undefined, definition, 2-16 br signal, 7-4, 8-8 branch folding, 6-14 branch instructions address calculation, 2-36 branch instructions, 2-36, a-24 condition register logical, 2-36, a-24 system linkage, 2-42, a-25 trap, 2-37, a-25 branch prediction, 6-1, 6-16 branch processing unit branch instruction timing, 6-17 execution timing, 6-14 latency, branch instructions, 6-23 overview, 1-9 branch resolution, 6-1 burst data transfers 32-bit data bus, 8-15 64-bit data bus, 8-14 transfers with data delays, timing, 8-36 burst transactions, 3-8 bus arbitration, see data bus bus configurations, 8-38, 8-40 bus interface unit (biu), 3-2 byte ordering default, 2-18 byte-reverse instructions, 2-31, a-22
index-2 mpc603e & EC603E risc microprocessors users manual motorola index c cache characteristics, 3-1 instructions, 2-41, 2-44, 3-22, a-26 mei state definition, 3-15 organization, instruction/data, 3-3?-7 overview, 1-25 cache arbitration, 6-9 cache block push operation, 3-8 cache block, definition, 3-1 cache cast-out operation, 3-8 cache coherency actions on load operations, 3-18 actions on store operations, 3-19 copy-back operation, 3-11 in single-processor systems, 3-18 mei protocol, 3-15 out-of-order execution, 3-13 overview, 3-2 reaction to bus operations, 3-19 wimg bits, 3-10, 3-13, 8-30 write-back mode, 3-11 cache hit, 6-9 cache management instructions, 2-41, 2-44, 3-22, a-26 cache miss, 6-10 cache operations basic data cache operations, 3-8 data cache transactions, 3-8 instruction cache fill operations, 3-4 overview, 1-13, 3-1 response to bus transactions, 3-19 cache unit memory performance, 6-19 operation of the cache, 8-2 overview, 3-1 cache-inhibited accesses (i bit) cache interactions, 3-10 i-bit setting, 3-11 timing considerations, 6-20 changed (c) bit maintenance recording, 5-11, 5-21?-24 checkstop signal, 7-24, 8-41 state, 4-22 ci signal, 7-14 classes of instructions, 2-16 clean block operation, 3-20 clock signals clk_out, 7-30 pll_cfg n , 7-30 sysclk, 7-30 compare instructions, 2-27, a-18 completion considerations, 6-11 completion, definition, 6-1 context synchronization, 2-20 conventions, xxxiii, xxxvii, 2-12 cop/scan interface, 7-28 copy-back mode, 6-19 cr logical instructions, 2-36 cse n signals, 7-15, 8-30 d data bus 32-bit data bus mode, 8-38 arbitration signals, 7-17, 8-8 bus arbitration, 8-22 data tenure, 8-7, c-4 data transfer, 7-19, 8-24 data transfer termination, 7-21, 8-25 data cache basic operations, 3-8 broadcasting, 3-7 bus transactions, 3-8 cache control, 3-6 configuration, 3-1 dcfi, dce, dlock bits, 3-6 disabling, 3-6 fill operations, 3-5, 3-8 locking, 3-6 organization, 3-5, c-15 touch load operations, 3-7 touch load support, 3-7 data storage interrupt (dsi), see dsi exception data tlb miss on load exception, 4-34 data tlb miss on store exception, 4-35 data transfers alignment, 2-13, 8-15 burst ordering, 8-14 eciwx and ecowx instructions, alignment, 8-19 signals, 8-24 dbb signal, 7-18, 8-8, 8-23 dbdis signal, 7-21 dbg signal, 7-17, 8-8 dbwo signal, 7-18, 8-8, 8-24, 8-43 dcmp and icmp registers, 2-10, 5-37 decrementer interrupt, 4-31, 9-2 defined instruction class, 2-16 dh n /dl n signals, 7-19 direct address translation (translation disabled) data accesses, 3-11, 5-9, 5-11, 5-20 instruction accesses, 3-11, 5-9, 5-11, 5-20 direct-store access on the 603e, 3-9 direct-store interface (603-specific) accesses, c-18 alignment exception, c-18 architectural ramifications of accesses, c-2
motorola index index-3 index bus protocol address and data tenures, c-4 detailed description, c-6 load access, timing, c-11 load operations, c-5 store access, timing, c-12 store operations, c-5 transactions, c-4 xats , c-3 instructions with no effect, c-19 no-op instructions, c-19 protection, c-18 segment protection, c-18 selection of direct-store segments, c-16 unsupported functions, c-19 dispatch considerations, 6-11 dmiss and imiss registers, 2-9, 5-36 dpe signal, 7-21 dp n signals, 7-20 drtry signal, 7-22, 8-25, 8-28 dsi exception, 4-23 e EC603E features list, 1-2 instructions not supported, b-3 overview, 1-1 effective address calculation address translation, 5-3 branches, 2-19, 2-36 loads and stores, 2-19, 2-29, 2-34 error termination, 8-29 exceptions alignment exception, 4-26 data tlb miss on load, 4-34 data tlb miss on store, 4-35 decrementer interrupt, 4-31 dsi exception, 4-23 enabling and disabling, 4-14 exception classifications, 4-2 exception processing, 4-10, 4-15 external interrupt, 4-25 fp unavailable exception, 4-31 instruction address breakpoint, 4-35 instruction related, 2-20 instruction tlb miss, 4-33 machine check exception, 4-21 overview, 1-27 program exception, 4-29 register settings fpscr, 4-30 msr, 4-17 srr0/srr1, 4-11 reset, 4-18 returning from an exception handler, 4-16 summary, 2-20 system call, 4-31 system management interrupt, 4-37 trace exception, 4-32 execution synchronization, 2-20 execution units, 1-10 external control instructions, 2-42, 8-19, a-27 f features list, 1-2 feed forwarding, 6-6 finish cycle, definition, 6-1 floating-point model fe0/fe1 bits, 4-14 fp arithmetic instructions, 2-26, a-19 fp compare instructions, 2-27, a-20 fp execution models, 2-12 fp load instructions, 2-34, a-23 fp move instructions, 2-28, a-24 fp multiply-add instructions, 2-26, a-20 fp rounding/conversion instructions, 2-27, a-20 fp store instructions, 2-34, a-24 fp unavailable exception, 4-31 fpscr instructions, 2-27, a-20 floating-point unit execution timing, 6-18 latency, fp instructions, 6-26 overview, 1-10 flow control instructions branch instruction address calculation, 2-36 branch instructions, 2-36 condition register logical, 2-36 flush block operation, 3-20 fpr0?pr31, 2-4 fpscr instructions, 2-27, a-20 g gbl signal, 7-15 gpr0?pr31, 2-4 guarded memory bit (g bit) cache interactions, 3-10 g-bit setting, 3-12 h hash1 and hash2 registers, 2-10, 5-37 hashing functions primary pteg, 5-32 secondary pteg, 5-33 hid0 register bit settings, 2-8 dcfi, dce, dlock bits, 3-6 doze bit, 9-4
index-4 mpc603e & EC603E risc microprocessors users manual motorola index doze, nap, sleep, dpm bits, 2-8 dpm enable bit, 9-3 icfi, ice, ilock bits, 3-4 nap bit, 9-4 pid7v-specific bits, 1-18, 3-22 hid1 register bit settings, 2-9 pll configuration, 2-9, 7-30 hreset signal, 7-25 i i/o tenures, c-4 iabr (instruction address breakpoint register), 2-11 ice control bit, 3-4 icfi control bit, 3-4 ieee 1149.1-compliant interface, 8-43 ifem (instruction fetch enable) bit, 1-18, 2-8 illegal instruction class, 2-17 ilock control bit, 3-4 instruction address breakpoint exception, 4-35 instruction cache cache control bits, 3-4 cache fill operations, 3-4 configuration, 3-1 icfi, ice, ilock bits, 3-4 organization, 3-3, c-14 instruction timing execution unit, 6-14 fetch, 6-9 instruction flow, 6-6 memory performance considerations, 6-18 overview, 1-33, 6-3 terminology, 6-1 timing considerations, 6-5 instruction tlb miss exception, 4-33 instruction unit, 1-9 instructions 603e, instructions not implemented, b-1 603e-specific instructions, 2-46 branch address calculation, 2-36 branch instructions, 2-36, a-24 cache management instructions, 2-41, 2-44, 3-22, a-26 classes, 2-16 condition register logical, 2-36, a-24 defined instructions, 2-16 EC603E, instructions not supported, b-3 external control, 2-42, a-27 floating-point arithmetic, 2-26, a-19 compare, 2-27, a-20 fp load instructions, 2-34, a-23 fp move instructions, 2-28, a-24 fp status and control register, 2-27 fp store instructions, 2-34, a-24 fpscr isntructions, 2-27, a-20 multiply-add, 2-26, a-20 rounding and conversion, 2-27, a-20 illegal instructions, 2-17 integer arithmetic, 2-22, a-17 compare, 2-22, a-18 load, a-21 logical, 2-23, a-18 multiple, 2-32, a-22 rotate and shift, 2-24, a-18?-19 store, 2-30, a-22 latency summary, 6-22 load and store address generation, floating-point, 2-34 address generation, integer, 2-29 byte-reverse instructions, 2-31, a-22 integer load, 2-29 integer multiple instructions, 2-32, a-22 integer store, 2-30 string instructions, 2-33, a-23 memory control, 2-41, 2-44, 3-22, a-26 memory synchronization, 2-38, 2-40, a-23 powerpc instructions, list form (format), a-28 function, a-17 legend, a-39 mnemonic, a-1 opcode, a-9 processor control, 2-37, 2-39, 2-42, a-25 reserved instructions, 2-18 segment register manipulation, 2-45, a-26 simplified mnemonics, 2-46 supervisor-level cache management, 2-44 support for lwarx/stwcx., 8-42 system linkage, 2-42, a-25 tlb management instructions, 2-45, a-26 trap instructions, 2-37, a-25 int signal, 7-23, 8-41 integer arithmetic instructions, 2-22, a-17 integer compare instructions, 2-22, a-18 integer load instructions, 2-29, a-21 integer logical instructions, 2-23, a-18 integer multiple instructions, 2-32, a-22 integer rotate and shift instructions, 2-24, a-18?-19 integer store instructions, 2-30, a-22 integer unit execution timing, 6-18 latency, integer instructions, 6-24 overview, 1-10 interrupt, external, 4-25 interrupt, see exceptions
motorola index index-5 index k kill block operation, 3-20 l latency, 6-1, 6-3, 6-22, 8-24 load operations i/o load accesses, c-5 memory coherency actions, 3-18 load/store address generation, 2-29, 2-34 byte-reverse instructions, 2-31, a-22 floating-point load instructions, 2-34, a-23 floating-point move instructions, 2-28, a-24 floating-point store instructions, 2-34, a-24 integer load instructions, 2-29, a-21 integer store instructions, 2-30, a-22 load/store multiple instructions, 2-32, a-22 memory synchronization instructions, 2-38, 2-40, a-23 string instructions, 2-33, a-23 load/store unit execution timing, 6-18 latency, load and store instructions, 6-28 logical addresses translation into physical addresses, 5-1 lwarx/stwcx. atomic memory references, 3-19 support, 8-42 m machine check exception checkstop state, 4-22 register settings, 4-22 srr1 bit settings, 4-11 machine check exception enabled, 4-22 mcp signal, 7-24 mei protocol definition, mei states, 3-15 enforcing memory coherency, 8-30 hardware considerations, 3-17 memory accesses, 8-4 memory coherency bit (m bit) cache interactions, 3-10 i-bit setting, 3-12 m-bit setting, 3-12 timing considerations, 6-19 memory control instructions segment register manipulation, 2-45 supervisor-level cache management, 2-44 tlb management, 2-45 user-level cache, 2-41, 2-44, 3-22 memory management unit address translation flow, 5-11 address translation mechanisms, 5-8, 5-11 block address translation, 5-8, 5-11, 5-20 block diagram, 5-5?-7 direct address translation, 3-11, 5-9, 5-11, 5-20 exceptions, 5-14 features summary, 5-2 instructions and registers, 5-17 memory protection, 5-10 overview, 1-12, 1-32 page address translation, 5-8, 5-11, 5-28 page history status, 5-11, 5-21?-25 page table search operation, 5-30 segment model, 5-21 software table search operation, 5-33, 5-38, 5-40 memory synchronization instructions, 2-38, 2-40, a-23 stwcx., 2-38 memory/cache access modes performance impact of copy-back mode, 6-19 see also wimg bits misaligned accesses, 2-13 misaligned data transfer, 8-17, 8-19 move instructions, 2-28 msr (machine state register) bit settings, 4-12 dr/ir bit, 4-13 ee bit, 4-12 fe0/fe1 bits, 4-14 pow bit, 2-5, 4-12 ri bit, 4-15 settings due to exception, 4-17 tgpr bit, 2-5, 4-12 n no-drtry mode, 8-40 nondenormalized mode, support, 2-25 o operand conventions, 2-12 operand placement and performance, 2-14 operating environment architecture (oea), xxviii, 1-16, 2-42 optional instructions, a-39 p page address translation page address translation flow, 5-28 page size, 5-21 selection of page address translation, 5-8, 5-14 table search operation, 5-30 tlb organization, 5-26 page history status r and c bit recording, 5-11, 5-21?-25
index-6 mpc603e & EC603E risc microprocessors users manual motorola index page tables page table updates, 5-50 resources for table search operations, 5-34 software table search operation, 5-33, 5-38 table search for pte, 5-30 performance considerations, memory, 6-18 phase locked loop, 9-4 physical address generation memory management unit, 5-1 pid7v-603e features, 1-4 pipeline instruction timing, definition, 6-2 pipeline stages description, 6-4 pipelined execution unit, 6-4 pll configuration, 7-31 power management doze mode, 9-4 doze, nap, sleep, dpm bits, 2-8, 2-9 full-power mode, 9-3 nap mode, 9-4 programmable power modes, 9-3 sleep mode, 9-5 software considerations, 9-6 power management modes, 1-15 power-on reset settings, 4-19 powerpc 603-specific features, 1-7, c-1 powerpc architecture instruction list, a-1, a-9, a-17 levels of implementation, 1-16 operating environment architecture (oea), xxviii, 1-16, 2-42 user instruction set architecture (uisa), xxviii, 1- 16, 2-1 virtual environment architecture (vea), xxviii, 1- 16, 2-39 privilege levels supervisor-level cache instruction, 2-44 privileged state, see supervisor mode problem state, see user mode process switching, 4-16 processor control instructions, 2-37, 2-39, 2-42, a-25 processor identification (pid) number definition, xxvii, 1-1 program exception, 4-29 program order, 6-2 programmable power states doze mode, 9-4 full-power mode (dpm enabled/disabled), 9-3 nap mode, 9-4 sleep mode, 9-5 protection of memory areas direct-store interface protection (603-specific), c-18 no-execute protection, 5-12 options available, 5-10 protection violations, 5-14 ptegs (pte groups), 5-30 ptes (page table entries), 5-30 q qack signal, 7-26, 8-38, 8-41 qreq signal, 7-26, 8-42 qualified bus grant, 8-8 qualified data bus grant, 8-23 r read atomic operation, 3-20 read operation, 3-20 read with intent to modify operation, 3-20 real address (ra), see physical address generation real addressing mode, see direct address translation reduced-pinout mode, 8-40 referenced (r) bit maintenance recording, 5-11, 5-21?-24, 5-31 registers configuration registers msr, 2-5 pvr, 2-6 exception handling registers dar, 2-6 dsisr, 2-6 sprg0?prg3, 2-6 srr0, 2-6 srr1, 2-6 implementation-specific registers dcmp/icmp, 2-9 dmiss/imiss, 2-9 hash1/hash2, 2-10 hid0/hid1, 1-18, 2-7 iabr, 2-11 rpa, 2-11 run_n, 1-19, 2-12 memory management registers bat registers, 2-6 sdr1, 2-6 sr, 2-6 supervisor-level bat registers, 2-6 dar, 2-6 dcmp and icmp, 2-9, 5-37 dec, 2-7 dmiss and imiss, 2-9, 5-36 dsisr, 2-6 ear, 2-7 hash1 and hash2, 2-10, 5-37 hid0 and hid1, 1-18, 2-7
motorola index index-7 index iabr, 2-11 msr, 2-5 pvr, 2-6 rpa, 2-11 sdr1, 2-6 sprg0?prg3, 2-6 sr, 2-6 srr0, 2-6 srr1, 2-6 tb, 2-6 user-level cr, 2-4 ctr, 2-4 fpr0?pr31, 2-4 fpscr, 2-4 gpr0?pr31, 2-4 lr, 2-4 tb, 2-5 tgpr0?gpr3, 5-35 xer, 2-4 rename buffer, 6-2 rename register operation, 6-12 reservation station, 6-2 reserved instruction class, 2-18 reset hreset signal, 7-25, 8-41 reset exception, 4-18 settings caused by hard reset, 4-19 sreset signal, 7-26, 8-41 rotate and shift instructions, 2-24, a-18?-19 rpa (required physical address), 2-11, 5-38 rsrv signal, 7-27, 8-42 run_n counter register, 1-19, 2-12 s segment registers sr manipulation instructions, 2-45, a-26 t bit, c-2, 3 segmented memory model, see memory management unit self-modifying code, 2-29 serializing instructions, 6-13 signals aack , 7-16 abb , 7-5, 8-8 address arbitration, 7-4, 8-8 address transfer, 8-12 address transfer attribute, 8-13 a n , 7-7 ape , 7-8 ap n , 7-8 artry , 7-16, 8-25 bg , 7-5, 8-8 br , 7-4, 8-8 checkstop, 8-41 ci , 7-14 ckstp_in , 7-24 ckstp_out , 7-25 clk_out, 7-30 configuration, 7-3 cop/scan interface, 7-28 cse n , 7-15, 8-30 data arbitration, 8-8, 8-22 data transfer termination, 8-25 dbb , 7-18, 8-8, 8-23 dbdis , 7-21 dbg , 7-17, 8-8 dbwo , 7-18, 8-8, 8-24, 8-43 dh n /dl n , 7-19 dpe , 7-21 dp n , 7-20 drtry , 7-22, 8-25, 8-28 gbl , 7-15 hreset , 7-25 int , 7-23, 8-41 mcp , 7-24 pll_cfg n , 7-30 qack , 7-26, 8-38, 8-41 qreq , 7-26, 8-42 reset, 8-41 rsrv , 7-27, 8-42 smi , 4-37, 7-24 sreset , 7-26, 8-41 ta , 7-22 tben, 7-27 tbst , 7-13, 8-24 tc n , 7-14, 8-20 tea , 7-23, 8-25, 8-29 tlbisync , 7-27 ts , 7-6 tsiz n , 7-12, 8-13 tt n , 7-9, 8-13 wt , 7-14 xats (603-specific), 1-7, c-2, c-3 single-beat reads with data delays, timing, 8-35 single-beat transactions, 3-8 single-beat transfer reads with data delays, timing, 8-34 reads, timing, 8-32 termination, 8-26 writes, timing, 8-33 smi signal, 4-37, 7-24 snoop operation, 3-19, 6-19 split-bus transaction, 8-8 spr encodings not implemented in 603e, b-5 sreset signal, 7-26 srr0/srr1 (status save/restore registers) bit settings for machine check exception, 4-11 bit settings for table search operations, 4-11
index-8 mpc603e & EC603E risc microprocessors users manual motorola index key bit derivation (srr1), 5-36 stall, 6-2 static branch prediction, 6-16 store operations i/o operations to buc, c-5 memory coherency actions, 3-19 single-beat writes, 8-33 string instructions, 2-33, a-23 superscalar, 6-2 supervisor mode, see privilege levels supervisor-level registers summary, 2-5 sync operation, 3-20 synchronization context/execution synchronization, 2-19 execution of rfi, 4-16 memory synchronization instructions, 2-38, 2-40, a-23 sysclk signal, 7-30 system call exception, 4-31 system interface overview, 1-35 system linkage instructions, 2-42, a-25 system management interrupt, 4-37, 9-2 system quiesce control signals, 8-42 system register unit execution timing, 6-18 latency, cr logical instructions, 6-24 latency, system register instructions, 6-23, c-20 system status ckstp_in , 7-24 ckstp_out , 7-25 hreset , 7-25 int , 7-23 mcp , 7-24 qack , 7-26 qreq , 7-26 rsrv , 7-27 smi , 7-24 sreset , 7-26 tben, 7-27 tlbisync , 7-27 t ta signal, 7-22 table search operations algorithm, 5-30 software routines, 5-33 software routines for the 603e, 5-38?-50 srr1 bit settings, 4-11 table search flow (primary and secondary), 5-31 tben signal, 7-27 tbst signal, 7-13, 8-13, 8-24 tc n signals, 7-14, 8-20 tea signal, 7-23, 8-29 termination, 8-20, 8-25 tgpr0?pr3 registers, 5-35 throughput, 6-2 timing diagrams, interface address transfer signals, 8-12 burst transfers with data delays, 8-36 direct-store interface load access, c-11 direct-store interface store access, c-12 single-beat reads, 8-32 single-beat reads with data delays, 8-34 single-beat writes, 8-33 single-beat writes with data delays, 8-35 use of tea , 8-37 using dbwo , 8-43 timing, instruction bpu execution timing, 6-14 branch timing example, 6-17 cache arbitration, 6-9 cache hit, 6-9 cache miss, 6-10 fpu execution timing, 6-18 instruction dispatch, 6-11 instruction fetch timing, 6-9 instruction flow, 6-6 instruction scheduling guidelines, 6-20 iu execution timing, 6-18 latency summary, 6-22 load/store unit execution timing, 6-18 overview, 6-3 sru execution timing, 6-18 stage, definition, 6-2 tlb description, 5-25 invalidate, a-26 invalidate (tlbie instruction), 5-27, 5-50 tlb management instructions, 2-46, a-26 tlbisync signal, 7-27 trace exception, 4-32 transactions, data cache, 3-8 transfer, 8-11, 8-24 trap instructions, 2-37 ts signal, 7-6, 8-12 tsiz n signals, 7-12, 8-13 tt n signals, 7-9, 8-13 u use of tea , timing, 8-37 user mode, 4-1 user instruction set architecture (uisa), xxviii, 1-16, 2-1 user-level registers summary, 2-4 user-mode, 2-42 using dbwo , timing, 8-43
motorola index index-9 index v virtual environment architecture (vea), xxviii, 1-16, 2-39 w wimg bits, 3-10, 8-30 write with atomic operation, 3-20 write with flush operation, 3-20 write with kill operation, 3-20 write-back, 6-2 write-back mode, 3-11 write-through mode (w bit) cache interactions, 3-10 timing considerations, 6-19 w-bit setting, 3-11 wt signal, 7-14 x xats signal (603-specific), 1-7, c-2, c-3
index-10 mpc603e & EC603E risc microprocessors users manual motorola index
overview programming model instruction and data cache operation exceptions memory management instruction timing signal descriptions system interface operation power management powerpc instruction set listings instructions not implemented powerpc 603 processor system design and programming considerations glossary index 2 3 4 5 6 7 8 9 a b c 1 glo ind
overview programming model instruction and data cache operation exceptions memory management instruction timing signal descriptions system interface operation power management powerpc instruction set listings instructions not implemented powerpc 603 processor system design and programming considerations glossary index 2 3 4 5 6 7 8 9 a b c 1 glo ind
attention! this book is a companion to the powerpc microprocessor family: the programming environments , referred to as the programming environments manual . note that the companion programming environments manual exists in two versions. see the preface for a description of the following two versions: powerpc microprocessor family: the programming environments, rev 1 order #: mpcfpe/ad powerpc microprocessor family: the programming environments for 32-bit microprocessors, rev 1 order #: mpcfpe32b/ad call the motorola ldc at 1-800-441-2447 (website: http://ldc.nmd.com) or contact your local sales of?e to obtain copies.


▲Up To Search▲   

 
Price & Availability of EC603E

All Rights Reserved © IC-ON-LINE 2003 - 2022  

[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy]
Mirror Sites :  [www.datasheet.hk]   [www.maxim4u.com]  [www.ic-on-line.cn] [www.ic-on-line.com] [www.ic-on-line.net] [www.alldatasheet.com.cn] [www.gdcy.com]  [www.gdcy.net]


 . . . . .
  We use cookies to deliver the best possible web experience and assist with our advertising efforts. By continuing to use this site, you consent to the use of cookies. For more information on cookies, please take a look at our Privacy Policy. X